UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

lemma of id's as a misspelling #420

Closed AngledLuffa closed 11 months ago

AngledLuffa commented 11 months ago

There are a couple instances of id's in the test set where I believe the text is meant to be the plural of post IDs, but for whatever reason it's misspelled as post-id's

en_ewt.test.gold.conllu:# text = Could you run those additional post-id's?
en_ewt.test.gold.conllu:# text = I should have a list of post-id's for you by 4:30 today.

In these cases, I would suggest the lemma should be id, whereas it's currently id'

AngledLuffa commented 11 months ago

there's also

NGO's -> ngo'

2       coordinator's   coordinator'    NOUN    NNS     Number=Plur     3       nsubj   3:nsubj _
9       GTC's   gtc'    NOUN    NNS     Number=Plur     7       obj     7:obj   _
36      one's   one'    NOUN    NNS     Number=Plur     27      advcl   27:advcl:if     _
16      to's    to'     NOUN    NNS     Number=Plur     12      conj    9:obj|12:conj:and       SpaceAfter=No
10      AREA'S  area'   NOUN    NNS     Number=Plur     8       obl     8:obl:at        _
3       UTH's   uth'    NOUN    NNS     Number=Plur     2       obj     2:obj   _
8       process's       process'        NOUN    NNS     Number=Plur     5       obj     5:obj   _
22      canape's        canape' NOUN    NNS     Number=Plur     20      nsubj   20:nsubj        _

This might be mistokenized:

# text = Besides parking is a pain..cramped and un-ruly with Kumon Parents next door....gives me heebee gee bees'
20      bees'   bees'   NOUN    NNS     Number=Plur     16      obj     16:obj  _

then probably cookin' should be cooking?

11      cookin' cookin' VERB    VBG     Tense=Pres|VerbForm=Part        3       conj    3:conj:and      _

from the dev:

16      DM's    dm'     NOUN    NNS     Number=Plur     18      nsubj   18:nsubj        _
7       astronaut's     astronaut'      NOUN    NNS     Number=Plur     10      nsubj   10:nsubj        _
13      area's  area'   NOUN    NNS     Number=Plur     8       obl     8:obl:in        _
nschneid commented 11 months ago

Are you looking at the current dev branch? Some of these appear to be fixed, e.g.

11  cookin' cook    VERB    VBG Style=Vrnc|Tense=Pres|VerbForm=Part 3   conj    3:conj:and  _

But in general (with a few exceptions like foreign words) the lemma should not end with ' if it has more than one character.

Would you mind making a PR to address these?