allenai / natural-instructions

Expanding natural instructions
https://instructions.apps.allenai.org/
Apache License 2.0
950 stars 188 forks source link

Task 199 (MNLI): is the data correct? #781

Closed ytyz1307zzh closed 1 year ago

ytyz1307zzh commented 1 year ago

Task 199: MNLI classification (text entailment)

Task instruction: In this task, you're given a pair of sentences, sentence 1 and sentence 2. Your job is to determine if the two sentences clearly agree/disagree with each other, or if this can't be determined. Indicate your answer as yes or no respectively.

I found multiple spurious data examples in the MNLI task. For instance, the following example comes from the "Positive Examples":

{
            "input": "Sentence 1: yeah i tell you what though if you go price some of those tennis shoes i can see why now you know they're getting up in the hundred dollar range. Sentence 2: The tennis shoes have only one price.",
            "output": "yes",
            "explanation": "The prices of the shoes vary in the hundred dollar range in the first sentence, so the second sentence disagrees with it as the tennis shoes can't have only one price."
        }

The following examples come from the data instances:

Input: Sentence 1: Chelimsky, Eleanor, and J. Dahmann. Sentence 2: Edgar and Hoover
Output: yes

Input: Sentence 1: well this machine does it all in one step it's Sentence 2: It saves a lot of time because it does it all in one step
Output: no

Besides, why are there only "yes" and "no" labels? NLI is a three-class classification problem.

Thanks for the clarification in advance!