UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

Typos not annotated as such #491

Closed rhdunn closed 7 months ago

rhdunn commented 7 months ago

Some of these have been annotated as abbreviations, but are arguably typos instead. In both cases, these are missing CorrectForm annotations:

his -> this

ERROR: Sentence email-enronsent16_01-0106 token 9 -- DT lemma 'this' does not match lowercase-form applied to form 'his', expected 'his'

hav -> have

WARN: Sentence answers-20111108104636AAw51HV_ans-0002 token 4 -- VB/Abbr=Yes lemma 'have' does not have a validation rule for form 'hav'
WARN: Sentence answers-20111108075853AAUIKRQ_ans-0002 token 5 -- VBP/Abbr=Yes lemma 'have' does not have a validation rule for form 'hav'
WARN: Sentence answers-20111108075853AAUIKRQ_ans-0002 token 22 -- VB/Abbr=Yes lemma 'have' does not have a validation rule for form 'hav'

shal -> shall

WARN: Sentence answers-20111108104636AAw51HV_ans-0002 token 22 -- MD/Abbr=Yes lemma 'shall' does not have a validation rule for form 'shal'

wel -> well

WARN: Sentence answers-20111108075853AAUIKRQ_ans-0002 token 16 -- UH/Abbr=Yes lemma 'well' does not have a validation rule for form 'wel'

their -> there

ERROR: Sentence email-enronsent31_01-0068 token 17 -- EX lemma 'there' does not match lowercase-form applied to form 'their', expected 'their'
nschneid commented 7 months ago

Will fix these. The other open issues I may not get to immediately—would help if someone could submit a PR.