UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

NumType & NumForm for pluralized decades? #527

Open nschneid opened 5 months ago

nschneid commented 5 months ago

These are tagged NOUN and the "s" is retained in the lemma (#344, #467).

Should NumType and NumForm be included as these are numeric in nature? The discussion in #344 suggests yes. But it was not implemented consistently in f19511a for EWT. GUM appears not to implement these features.

Queries for NOUN+NumType+Number: EWT, GUM

nschneid commented 2 months ago

@amir-zeldes thoughts?

amir-zeldes commented 2 months ago

I guess they are a kind of number... I'd be OK with Number=Plur, Card and NumForm=Combi? Or did we decide if it's NOUN it can't have NumType/NumForm?

nschneid commented 2 months ago

Ordinals are ADJ but have numerical features. I think it's reasonable to apply them to NOUNs as well.

I have a slight feeling that this is not a typical NumForm=Combi because the morphology is unrelated to its status as a number (it's regular noun morphology, and the noun happens to be derived from a number). But it is literally a number plus suffix so I guess it's fine.

amir-zeldes commented 2 months ago

Yeah, I understand NumForm as an orthographic feature, so I think it would fit. Can implement for GUM/GENTLE.

nschneid commented 2 months ago

Actually it looks like Number=Ptan applies (https://github.com/UniversalDependencies/docs/issues/999).

So:

1990s   1990s   NOUN    NNS Number=Ptan|NumForm=Combi|NumType=Card

50's    50s NOUN    NNS Number=Ptan|NumForm=Combi|NumType=Card

mid-1980s   mid-1980s   NOUN    NNS Number=Ptan|NumForm=Combi|NumType=Card

nineteen    nineteen    NUM CD  NumForm=Word|NumType=Card   8   compound    8:compound  SpaceAfter=No
-   -   PUNCT   HYPH    _   6   punct   6:punct SpaceAfter=No
eighties    eighties    NOUN    NNS Number=Ptan|NumForm=Word|NumType=Card   3   nmod    3:nmod:of   SpaceAfter=No
amir-zeldes commented 2 months ago

OK, Ptan it is then!