UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

Possible incorrect part of speech due to form -> lemma morphology #485

Closed rhdunn closed 2 months ago

rhdunn commented 7 months ago

The following are lemmatized using different rules to what their XPOS implies. Should the part of speech be updated to one of the ones in parenthesis?

-est (JJS, RBS)

The following are JJS/Degree=Sup, not JJ/Degree=Pos:

ERROR: Sentence reviews-330275-0001 token 4 -- JJ lemma 'nice' does not match lowercase-form applied to form 'nicest', expected 'nicest'

-ed (VBN, VBD)

ERROR: Sentence newsgroup-groups.google.com_humanities.lit.authors.shakespeare_0c155162a7dfaf28_ENG_20031127_172200-0027 token 11 -- JJ lemma 'incarcerate' does not match lowercase-form applied to form 'incarcerated', expected 'incarcerated'
ERROR: Sentence newsgroup-groups.google.com_MeninLingerie_78adf09ead5e7e87_ENG_20041219_035800-0005 token 4 -- VB lemma 'fight' does not match lowercase-form applied to form 'fought', expected 'fought'
ERROR: Sentence newsgroup-groups.google.com_hiddennook_5380fdd00f8e5e56_ENG_20050926_194800-0019 token 6 -- VB lemma 'recognize' does not match lowercase-form applied to form 'recognized', expected 'recognized'
ERROR: Sentence answers-20111108091609AAHOFa6_ans-0001 token 7 -- VB lemma 'use' does not match lowercase-form applied to form 'used', expected 'used'
ERROR: Sentence answers-20111108092321AAK0Eqp_ans-0017 token 4 -- VB lemma 'choose' does not match lowercase-form applied to form 'chose', expected 'chose'
ERROR: Sentence reviews-148971-0006 token 10 -- VB lemma 'mean' does not match lowercase-form applied to form 'meant', expected 'meant'
ERROR: Sentence answers-20111108081748AAkQhGe_ans-0004 token 2 -- VBP lemma 'get' does not match lowercase-form applied to form 'got', expected 'got'
ERROR: Sentence answers-20111108084149AAbQBhq_ans-0003 token 3 -- VBP lemma 'get' does not match lowercase-form applied to form 'got', expected 'got'
ERROR: Sentence answers-20111108110610AA4bcXX_ans-0018 token 1 -- VBP lemma 'hope' does not match lowercase-form applied to form 'Hoped', expected 'hoped'
ERROR: Sentence answers-20111108092643AAXe4lD_ans-0064 token 13 -- VBP lemma 'get' does not match lowercase-form applied to form 'got', expected 'got'
ERROR: Sentence reviews-159371-0005 token 24 -- VBP lemma 'sleep' does not match lowercase-form applied to form 'slept', expected 'slept'
ERROR: Sentence reviews-121342-0007 token 2 -- VBP lemma 'order' does not match lowercase-form applied to form 'ordered', expected 'ordered'
ERROR: Sentence reviews-181748-0005 token 21 -- VBP lemma 'deal' does not match lowercase-form applied to form 'dealt', expected 'dealt'

-en (VBN)

ERROR: Sentence answers-20111108102133AAwVd7m_ans-0011 token 28 -- VB lemma 'break' does not match lowercase-form applied to form 'broken', expected 'broken'

-es/s (VBZ, NNS, or NNPS)

ERROR: Sentence answers-20111108111010AASEk0S_ans-0004 token 23 -- VB lemma 'rest' does not match lowercase-form applied to form 'rests', expected 'rests'
ERROR: Sentence reviews-187266-0002 token 20 -- VB lemma 'travel' does not match lowercase-form applied to form 'travels', expected 'travels'
ERROR: Sentence weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0035 token 32 -- VBD lemma 'want' does not match past-tense-verb applied to form 'wants', expected 'wants'
ERROR: Sentence newsgroup-groups.google.com_MeninLingerie_78adf09ead5e7e87_ENG_20041219_035800-0009 token 3 -- VBP lemma 'know' does not match lowercase-form applied to form 'knows', expected 'knows'
ERROR: Sentence answers-20111107155302AAXXuM1_ans-0002 token 14 -- VBP lemma 'want' does not match lowercase-form applied to form 'wants', expected 'wants'
ERROR: Sentence reviews-287501-0002 token 4 -- VBP lemma 'provide' does not match lowercase-form applied to form 'provides', expected 'provides'
ERROR: Sentence reviews-385436-0001 token 16 -- VBP lemma 'drive' does not match lowercase-form applied to form 'drives', expected 'drives'
ERROR: Sentence reviews-118770-0002 token 6 -- VBP lemma 'taste' does not match lowercase-form applied to form 'tastes', expected 'tastes'
ERROR: Sentence answers-20111108074555AAFT8Aj_ans-0001 token 8 -- VBP lemma 'hold' does not match lowercase-form applied to form 'holds', expected 'holds'
ERROR: Sentence reviews-058009-0001 token 2 -- VBP lemma 'taste' does not match lowercase-form applied to form 'tastes', expected 'tastes'
ERROR: Sentence reviews-275595-0002 token 3 -- VBP lemma 'want' does not match lowercase-form applied to form 'wants', expected 'wants'
ERROR: Sentence weblog-typepad.com_ripples_20040407125600_ENG_20040407_125600-0014 token 21 -- VBP lemma 'contradict' does not match lowercase-form applied to form 'contradicts', expected 'contradicts'
ERROR: Sentence email-enronsent27_01-0047 token 2 -- VBP lemma 'rule' does not match lowercase-form applied to form 'rules', expected 'rules'
ERROR: Sentence email-enronsent40_01-0017 token 4 -- VBP lemma 'sound' does not match lowercase-form applied to form 'sounds', expected 'sounds'
ERROR: Sentence email-enronsent40_01-0063 token 2 -- VBP lemma 'intend' does not match lowercase-form applied to form 'intends', expected 'intends'
ERROR: Sentence newsgroup-groups.google.com_GuildWars_086f0f64ab633ab3_ENG_20041111_173500-0020 token 22 -- VBP lemma 'know' does not match lowercase-form applied to form 'knows', expected 'knows'
ERROR: Sentence answers-20111108111107AAlrzok_ans-0008 token 5 -- VBP lemma 'see' does not match lowercase-form applied to form 'sees', expected 'sees'
ERROR: Sentence answers-20111108111107AAlrzok_ans-0009 token 8 -- VBP lemma 'see' does not match lowercase-form applied to form 'sees', expected 'sees'
ERROR: Sentence answers-20111108065616AAKtL2c_ans-0015 token 14 -- VBP lemma 'continue' does not match lowercase-form applied to form 'continues', expected 'continues'
ERROR: Sentence reviews-295491-0004 token 6 -- VBP lemma 'treat' does not match lowercase-form applied to form 'treats', expected 'treats'
ERROR: Sentence reviews-187875-0004 token 3 -- VBP lemma 'taste' does not match lowercase-form applied to form 'tastes', expected 'tastes'
ERROR: Sentence reviews-268673-0002 token 7 -- VBP lemma 'talk' does not match lowercase-form applied to form 'talks', expected 'talks'
ERROR: Sentence reviews-317846-0004 token 19 -- VBP lemma 'serve' does not match lowercase-form applied to form 'serves', expected 'serves'
ERROR: Sentence reviews-326649-0003 token 8 -- VBP lemma 'care' does not match lowercase-form applied to form 'cares', expected 'cares'
ERROR: Sentence reviews-288100-0003 token 17 -- VBP lemma 'lack' does not match lowercase-form applied to form 'lacks', expected 'lacks'

-ing (VBG)

ERROR: Sentence answers-20090730195539AAVSpaH_ans-0004 token 11 -- VBP lemma 'have' does not match lowercase-form applied to form 'having', expected 'having'

singular present (VBP)

ERROR: Sentence email-enronsent25_01-0068 token 8 -- VB lemma 'be' does not match lowercase-form applied to form 'are', expected 'are'
ERROR: Sentence reviews-270502-0004 token 3 -- VB lemma 'be' does not match lowercase-form applied to form 'am', expected 'am'
ERROR: Sentence reviews-003418-0012 token 3 -- VB lemma 'be' does not match lowercase-form applied to form 'am', expected 'am'

singular present, third person (VBZ)

ERROR: Sentence answers-20111108003939AA3pvxF_ans-0002 token 13 -- VBP lemma 'have' does not match lowercase-form applied to form 'has', expected 'has'
ERROR: Sentence reviews-277703-0001 token 7 -- VBP lemma 'have' does not match lowercase-form applied to form 'has', expected 'has'
ERROR: Sentence email-enronsent02_01-0045 token 3 -- VBP lemma 'have' does not match lowercase-form applied to form 'has', expected 'has'
ERROR: Sentence answers-20111108104724AAuBUR7_ans-0035 token 6 -- VBP lemma 'have' does not match lowercase-form applied to form 'has', expected 'has'

past tense (VBD)

ERROR: Sentence reviews-326112-0002 token 2 -- VBP lemma 'ring' does not match lowercase-form applied to form 'rang', expected 'rang'
ERROR: Sentence reviews-130795-0006 token 11 -- VBP lemma 'drive' does not match lowercase-form applied to form 'Drove', expected 'drove'
ERROR: Sentence reviews-349020-0002 token 4 -- VBP lemma 'drive' does not match lowercase-form applied to form 'drove', expected 'drove'
ERROR: Sentence reviews-165018-0005 token 8 -- VBP lemma 'know' does not match lowercase-form applied to form 'knew', expected 'knew'
ERROR: Sentence reviews-116821-0011 token 6 -- VBP lemma 'fly' does not match lowercase-form applied to form 'flew', expected 'flew'
ERROR: Sentence reviews-319816-0011 token 17 -- VBP lemma 'have' does not match lowercase-form applied to form ''d', expected ''d'

noun (NN)

ERROR: Sentence reviews-219984-0001 token 2 -- VBD lemma 'respond' does not match past-tense-verb applied to form 'response', expected 'response'

proper noun (NNP)

This is a middle name initial, so should be PROPN+NNP:

WARN: Sentence newsgroup-groups.google.com_humanities.lit.authors.shakespeare_0018a7697318f71f_ENG_20031006_163200-0054 token 5 -- VBN/Abbr=Yes lemma 'bear' does not have a validation rule for form 'b.'
nschneid commented 2 months ago

I think these are all fixed now. Thanks!