UniversalDependencies / UD_English-PUD

Parallel Universal Dependencies.
Other
10 stars 2 forks source link

Incorrect lemmas for verbs #39

Open rhdunn opened 9 months ago

rhdunn commented 9 months ago
ERROR: Sentence w01105055 token 18 -- VBD lemma 'based' does not match past-tense-verb applied to form 'based', expected 'base'
ERROR: Sentence n01032032 token 9 -- VBN lemma 'held' does not match lemma-exception applied to form 'held', expected 'hold'
ERROR: Sentence n01043005 token 7 -- VBN lemma 'called' does not match past-participle-verb applied to form 'called', expected 'call'
ERROR: Sentence n01071009 token 4 -- VBN lemma 'fired' does not match past-participle-verb applied to form 'fired', expected 'fire'
ERROR: Sentence n01077030 token 7 -- VBN lemma 'called' does not match past-participle-verb applied to form 'called', expected 'call'
ERROR: Sentence n01119012 token 17 -- VBN lemma 'made' does not match lemma-exception applied to form 'made', expected 'make'
ERROR: Sentence w01018101 token 18 -- VBN lemma 'aged' does not match past-participle-verb applied to form 'aged', expected 'age'
ERROR: Sentence w01085005 token 18 -- VBN lemma 'prepared' does not match past-participle-verb applied to form 'prepared', expected 'prepare'
ERROR: Sentence w01094066 token 8 -- VBN lemma 'sized' does not match past-participle-verb applied to form 'sized', expected 'size'
ERROR: Sentence w01130101 token 5 -- VBN lemma 'cowritten' does not match lemma-exception applied to form 'cowritten', expected 'cowrite'

Modals

ERROR: Sentence n01036020 token 2 -- MD lemma 'would' does not match lemma-exception applied to form ''d', expected 'will'
ERROR: Sentence n01080042 token 13 -- MD lemma 'would' does not match lemma-exception applied to form '’d', expected 'will'
ERROR: Sentence n01091017 token 14 -- MD lemma 'would' does not match lemma-exception applied to form '’d', expected 'will'
ERROR: Sentence n01121032 token 12 -- MD lemma 'would' does not match lemma-exception applied to form '’d', expected 'will'

UK vs US

In the UK and Commonwealth, the lemma ends in "l", but in the US it ends in "ll":

ERROR: Sentence w01111021 token 7 -- VBD lemma 'enrol' does not match past-tense-verb applied to form 'enrolled', expected 'enroll'
ERROR: Sentence w01115023 token 2 -- VBD lemma 'enrol' does not match past-tense-verb applied to form 'enrolled', expected 'enroll'
ERROR: Sentence w01125037 token 3 -- VBN lemma 'appal' does not match past-participle-verb applied to form 'appalled', expected 'appall'

Note: My validator cannot differentiate these variations yet to be able to report UK vs US English lemmas. As such, there may be other instances/examples I haven't spotted in the validation output.

AngledLuffa commented 9 months ago

EWT treats enrolled and appalled the same way

https://github.com/UniversalDependencies/UD_English-EWT/issues/480

AngledLuffa commented 9 months ago

Regarding the modals, I'm not so sure about that. Both EWT and GUM treat it as would

nschneid commented 9 months ago

Re: lemmas of modal auxes, see UniversalDependencies/UD_English-EWT#450

rhdunn commented 9 months ago

Looks like the linked EWT issue is preserving the form of the lemma without converting it to the base form like with other verbs. I'll update my validator to follow this.

nschneid commented 9 months ago

The question is whether we should annotate modal auxiliaries as having tense at all. If not, then "will" and "would" are morphologically unrelated words and it makes sense that their lemmas are different.

rhdunn commented 9 months ago

If modals are to currently preserve the form, then "wo" in "won't" needs to be "would" as well as the "'d" in "he'd" etc.:

ERROR: Sentence n01123024 token 3 -- MD lemma 'will' does not match lemma-exception applied to form 'wo', expected 'would'
ERROR: Sentence n01123024 token 8 -- MD lemma 'will' does not match lemma-exception applied to form 'wo', expected 'would'
ERROR: Sentence n01150051 token 3 -- MD lemma 'will' does not match lemma-exception applied to form 'wo', expected 'would'
dan-zeman commented 9 months ago

Isn't won't a short form of will not?

nschneid commented 9 months ago

Yes: won't = will not, wouldn't = would not

rhdunn commented 9 months ago

Ah yes, you are right!

AngledLuffa commented 9 months ago

Did the English spellings and the incorrect verbs. Anything else for this issue?