UniversalDependencies / UD_English-GUMReddit

Other
1 stars 2 forks source link

CorrectForm missing CorrectLemma and other CorrectFEATURE for typos #17

Closed rhdunn closed 11 months ago

rhdunn commented 11 months ago

These have CorrectForm values that correct typos. The lemma should be in CorrectLemma.

ERROR: Sentence GUM_reddit_social-53 token 21 -- JJ lemma 'guilty' does not match lowercase-form applied to form 'quilty', expected 'quilty'
ERROR: Sentence GUM_reddit_space-4 token 31 -- NNS/Number=Plur lemma 'hole' does not match plural-common-noun applied to form '-holes', expected '-hole'
ERROR: Sentence GUM_reddit_escape-52 token 2 -- PRP lemma 'she' does not match lowercase-form applied to form 'Shes', expected 'shes'
ERROR: Sentence GUM_reddit_ring-43 token 9 -- PRP lemma 'you' does not match lowercase-form applied to form 'your', expected 'your'
ERROR: Sentence GUM_reddit_ring-67 token 19 -- PRP lemma 'they' does not match lowercase-form applied to form 'the', expected 'the'
ERROR: Sentence GUM_reddit_escape-13 token 4 -- RB lemma 'not' does not match lowercase-form applied to form 'nit', expected 'nit'
ERROR: Sentence GUM_reddit_bobby-20 token 45 -- WDT lemma 'that' does not match lowercase-form applied to form 'than', expected 'than'
ERROR: Sentence GUM_reddit_bobby-43 token 11 -- IN lemma 'to' does not match lowercase-form applied to form 'too', expected 'too'
ERROR: Sentence GUM_reddit_gender-30 token 37 -- IN lemma 'than' does not match lowercase-form applied to form 'that', expected 'that'
ERROR: Sentence GUM_reddit_gender-49 token 25 -- RB lemma 'reasonably' does not match lowercase-form applied to form 'reasonable', expected 'reasonable'
ERROR: Sentence GUM_reddit_introverts-16 token 4 -- IN lemma 'whether' does not match lowercase-form applied to form 'wether', expected 'wether'
ERROR: Sentence GUM_reddit_space-42 token 4 -- RB lemma 'definitely' does not match lowercase-form applied to form 'definately', expected 'definately'
ERROR: Sentence GUM_reddit_steak-26 token 4 -- WP$ lemma 'whose' does not match lowercase-form applied to form 'who's', expected 'who's'
ERROR: Sentence GUM_reddit_superman-13 token 19 -- RB lemma 'not' does not match lowercase-form applied to form 'nt'', expected 'nt''
ERROR: Sentence GUM_reddit_callout-27 token 24 -- PRP$ lemma 'its' does not match lowercase-form applied to form 'it's', expected 'it's'
ERROR: Sentence GUM_reddit_racial-28 token 13 -- PRP$ lemma 'its' does not match lowercase-form applied to form 'it's', expected 'it's'
ERROR: Sentence GUM_reddit_racial-28 token 17 -- PRP$ lemma 'its' does not match lowercase-form applied to form 'it's', expected 'it's'
ERROR: Sentence GUM_reddit_racial-30 token 18 -- PRP$ lemma 'its' does not match lowercase-form applied to form 'it's', expected 'it's'
ERROR: Sentence GUM_reddit_conspiracy-23 token 1 -- VB lemma 'lo' does not match lowercase-form applied to form 'Low', expected 'low'
ERROR: Sentence GUM_reddit_social-44 token 5 -- VB lemma 'have' does not match lowercase-form applied to form 'fave', expected 'fave'
ERROR: Sentence GUM_reddit_conspiracy-49 token 11 -- VBD lemma 'be' does not match past-tense-verb applied to form 'where', expected 'where'
ERROR: Sentence GUM_reddit_callout-16 token 26 -- VBG lemma 'do' does not match present-verb applied to form 'doiong', expected 'doiong'
ERROR: Sentence GUM_reddit_escape-57 token 3 -- VBP lemma 'be' does not match lowercase-form applied to form 'M', expected 'm'
ERROR: Sentence GUM_reddit_callout-34 token 2 -- VBP lemma 'have' does not match lowercase-form applied to form 've', expected 've'
ERROR: Sentence GUM_reddit_callout-48 token 20 -- VBP lemma 'be' does not match lowercase-form applied to form 'm', expected 'm'
ERROR: Sentence GUM_reddit_escape-27 token 2 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_callout-47 token 7 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_conspiracy-2 token 13 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_gender-51 token 8 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_introverts-35 token 2 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_racial-16 token 2 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_racial-22 token 3 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_racial-26 token 3 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_space-11 token 31 -- VBZ lemma 'be' does not match present-3p-verb applied to form 's', expected ''
ERROR: Sentence GUM_reddit_card-31 token 23 -- NN lemma 'card' does not match lowercase-form applied to form 'car', expected 'car'
ERROR: Sentence GUM_reddit_conspiracy-60 token 13 -- NN lemma 'something' does not match lowercase-form applied to form 'somthing', expected 'somthing'
ERROR: Sentence GUM_reddit_introverts-15 token 26 -- NN lemma 'awkward' does not match lowercase-form applied to form 'awkard', expected 'awkard'
ERROR: Sentence GUM_reddit_ring-43 token 8 -- NN lemma 'explanation' does not match lowercase-form applied to form 'explaination', expected 'explaination'
ERROR: Sentence GUM_reddit_space-52 token 15 -- NN lemma 'balloon' does not match lowercase-form applied to form 'baloon', expected 'baloon'
ERROR: Sentence GUM_reddit_superman-8 token 29 -- NN lemma 'weight' does not match lowercase-form applied to form 'wait', expected 'wait'

These are also missing CorrectSpaceAfter=No in addition to CorrectLemma:


ERROR: Sentence GUM_reddit_superman-7 token 4 -- VB lemma 'have' does not match lowercase-form applied to form 've', expected 've'
RROR: Sentence GUM_reddit_escape-13 token 3 -- VBZ lemma 'be' does not match present-3p-verb applied to form 'a', expected 'a'
E```
amir-zeldes commented 11 months ago

I don't think we've been doing CorrectLemma (not in Reddit and not elsewhere), so until we decide to get into that, it's a wontfix I'm afraid...

The space errors are actually the opposite: they were spelled togehter in the original, so they should have SpaceAfter=No, or actually more properly they should be MWTs. Will fix.

dan-zeman commented 11 months ago

I don't understand why we would need CorrectLemma at all. If there is a typo in the FORM, then Typo=Yes should be in FEATS, CorrectForm should be in MISC, but the correct lemma should be in the LEMMA column. No need to show a wrong lemma there.

rhdunn commented 11 months ago

I've replied to this in the docs issue linked above.