alexwarstadt / blimp

The Benchmark of Linguistic Minimal Pairs
138 stars 12 forks source link

Spurious "'s" and "'" in sentential_subject_island.jsonl #8

Open JoseLlarena opened 1 year ago

JoseLlarena commented 1 year ago

It appears that many (all?) entries in _sentential_subjectisland.jsonl have a spurious possessive as part of the subject. For instance:

{'UID': 'sentential_subject_island', 'field': 'syntax', 'lexically_identical': True, 'linguistics_term': 'island_effects', 'one_prefix_method': False, 'pairID': '23', 'sentence_bad': "Who have children' investigating alarmed Joel.", 'sentence_good': "Who have children' investigating Joel alarmed.", 'simple_LM_method': True, 'two_prefix_method': False}

{'UID': 'sentential_subject_island', 'field': 'syntax', 'lexically_identical': True, 'linguistics_term': 'island_effects', 'one_prefix_method': False, 'pairID': '32', 'sentence_bad': "Who were those governments' talking about astounding Jason.", 'sentence_good': "Who were those governments' talking about Jason astounding.", 'simple_LM_method': True, 'two_prefix_method': False}

jorendorff commented 1 year ago

I think it's on purpose. The intended possessive construction is like "[Janet's talking during the play] was annoying." or "We didn't count on [their taking three hours for dinner]."

So the meaning of the first "good" sentence is, "Who was alarmed by the fact that children were investigating Joel?" and the second, "Who were being astounded by the fact that those governments were talking about Jason?"

This category stands out in the paper as the one with the weakest support from the crowd-sourced survey: humans only picked the "right" answer 61% of the time.

In any case, the possessive of "children" is certainly not "children'". I think ultimately it's very hard to build an instrument with 67,000 questions. Some of the templates worked better than others.

JoseLlarena commented 1 year ago

Ok, so the first sentence has a typo, thanks. But even after correcting for that, I still can't get a valid parse out of either of them, because, if they are testing for a subject gerund phrase, as hinted by your examples, then the main sentence's verb should be singular, as single gerunds are always singular, and then subject and verb have to agree in number. "Who" would then be the direct object in both sentences.

So, on that view, the first sentence would be: "Who has [children's investigating Joel] alarmed" and the second: "Who was [those governments' talking about Jason] astounding"; where I've used italics for the main verbs and bracketed the gerund subjects.

Do you see an alternative parse?

jorendorff commented 1 year ago

Ah, no, I think you're correct, these examples are wrong.

JoseLlarena commented 1 year ago

Great, we are in agreement! Would you like me to make a pull request with the corrections? I can't do right now cause I'm preparing for ACL, but sometime "soon" :)