Instruction-Tuning-with-GPT-4 / GPT-4-LLM

Instruction Tuning with GPT-4
https://instruction-tuning-with-gpt-4.github.io/
Apache License 2.0
4.22k stars 301 forks source link

Erroneous examples in Unnatural Instruction data #25

Open qmdnls opened 1 year ago

qmdnls commented 1 year ago

There are a lot of duplicate examples with seemingly mismatched output and label fields in unnatural_instruction_gpt4_data.json. For example:

  {
    "instruction": "Write a function that takes in a list of integers and outputs the sum of all the integers.",
    "input": "List of integers: [1, 2, 3].",
    "output": "No.",
    "label": "Yes"
  },
  {
    "instruction": "Write a function that takes in a list of integers and outputs the sum of all the integers.",
    "input": "List of integers: [1, 2, 3].",
    "output": "['What color was the lamb's fleece?'] - 5",
    "label": "6\n4"
  },

and

  {
    "instruction": "You need to read a piece of text and then answer the questions based on what is stated in the text. The questions will ask you about specific details in the text, and your answers must be drawn directly from the text itself.",
    "input": "Text: In 1789, George Washington was inaugurated as the first president of United States of America. He was 70 years old at that time. John Adams was his vice president.",
    "output": "1. How old was George Washington when he was inaugurated as the first president of the United States in 1789?\n2. Who was his vice president at that time?\n3. What was the relationship between John Adams and George Washington during their time as president and vice president?",
    "label": "1. What year was George Washington inaugurated as the first president of the United States of America?\n\n1789"
  },
  {
    "instruction": "You need to read a piece of text and then answer the questions based on what is stated in the text. The questions will ask you about specific details in the text, and your answers must be drawn directly from the text itself.",
    "input": "Text: In 1789, George Washington was inaugurated as the first president of United States of America. He was 70 years old at that time. John Adams was his vice president.",
    "output": "['Karen', 'Mark'].",
    "label": "Hazel and Lauren went out for lunch together. Hazel had been wanting to try this new restaurant for months."
  },

In the first case both examples are mismatched, in the seond case the first one seems to be correct and the second occurrence seems to be wrong. There are a lot of cases like this in the file.

Possibly related to #2?