UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

Missing Style=Vrnc for vernacular words #492

Closed rhdunn closed 2 days ago

rhdunn commented 7 months ago

The following are Style=Vrnc instead of abbreviations. These are also missing CorrectForm annotations:

-in -> -ing

G-dropping is a vernacular speech feature:

WARN: Sentence answers-20111108081748AAkQhGe_ans-0003 token 41 -- VBG/Abbr=Yes lemma 'go' does not have a validation rule for form 'goin'
WARN: Sentence reviews-351950-0002 token 6 -- VBG/Abbr=Yes lemma 'playing' does not have a validation rule for form 'playin'
ERROR: Sentence reviews-164580-0006 token 12 -- NN/Abbr=Yes lemma 'loving' does not match uppercase-form applied to form 'lovin'', expected 'LOVIN''
ERROR: Sentence newsgroup-groups.google.com_eHolistic_2dd76f31ceb6bfe8_ENG_20050513_224200-0049 token 11 -- VBG lemma 'cook' does not match present-verb applied to form 'cookin'', expected 'cookin''
ERROR: Sentence answers-20111108111312AAq4ETn_ans-0021 token 19 -- VBG lemma 'walk' does not match present-verb applied to form 'walkin', expected 'walkin'

ya -> you

ERROR: Sentence answers-20111107180248AAnQ3aE_ans-0002 token 1 -- PRP lemma 'you' does not match lowercase-form applied to form 'Ya', expected 'ya'
ERROR: Sentence email-enronsent15_01-0039 token 2 -- PRP lemma 'you' does not match lowercase-form applied to form 'ya', expected 'ya'

'em -> them

ERROR: Sentence reviews-255261-0010 token 16 -- PRP lemma 'they' does not match lowercase-form applied to form ''em', expected ''em'
ERROR: Sentence reviews-018548-0005 token 6 -- PRP lemma 'they' does not match lowercase-form applied to form ''em', expected ''em'

yo -> your

ERROR: Sentence email-enronsent08_02-0009 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0020 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0020 token 24 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0020 token 27 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0021 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0022 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0023 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0024 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'
ERROR: Sentence email-enronsent08_02-0025 token 1 -- PRP$ lemma 'your' does not match lowercase-form applied to form 'Yo', expected 'yo'

of -> have

ERROR: Sentence answers-20111108091921AAaLK4e_ans-0020 token 15 -- VB lemma 'have' does not match lowercase-form applied to form 'of', expected 'of'
ERROR: Sentence reviews-308088-0002 token 37 -- VB lemma 'have' does not match lowercase-form applied to form 'OF', expected 'of'
ERROR: Sentence reviews-294081-0014 token 5 -- VB lemma 'have' does not match lowercase-form applied to form 'OF', expected 'of'
nschneid commented 2 days ago

Pronouns (see https://universaldependencies.org/en/pos/PRON.html):

Do these need CorrectForms? I figure they are sufficiently well established that they might appear in a dictionary, so I'm not sure they need "correcting" (short of an actual spelling issue like missing apostrophe in 'em).

rhdunn commented 2 days ago

I can add those as exceptions to my checker/validaion tool.