UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

ADJs missing Degree #529

Closed nschneid closed 1 month ago

nschneid commented 2 months ago


ADJ/NNP combos. Not an issue in GUM.

AngledLuffa commented 2 months ago

The same query shows nothing in PUD, but that's because of a more fundamental reason - it doesn't follow the convention on using non-PROPN upos tags for verbs, adj, etc.

For example:

United_PROPN States -> United_ADJ
New_PROPN York -> New_ADJ

Judging from GUM, it looks like South_PROPN but Southern_ADJ is the standard

should also have:

National_ADJ Savings
Lower_ADJ Australia
Black_ADJ Sea
Middle_ADJ Bronze_PROPN Age   (unsure of Bronze)
Trojan_ADJ War_PROPN    (ref: American_ADJ Academy of Pediatrics, etc)
Second_ADJ Messenian_ADJ War_PROPN
First_ADJ Nations_PROPN
Disney also found that Universal_PROPN owned the ...
White_ADJ House
Plaza_PROPN de_PROPN las_PROPN Victorias_PROPN
Dark_ADJ Ages
Industrial_ADJ Revolution    ???
Prime_ADJ Minister

There's also a proper noun, The Wiz_PROPN, with The_DET matching The_DET Thirteenth Night from GUM

Something I don't like in GUM: Latin_PROPN America vs Latin_ADJ Club. Seems inconsistent.

Olympic is not consistent between PUD, GUM, and EWT

There's a distinction being made with Great in GUM: Peter the Great_PROPN vs Great_ADJ Republic. Following that, we should have Great_ADJ Moravia, Great_ADJ Britain, etc in PUD. However, there is also some variety between Great_PROPN Lakes, Great_PROPN Plains, or Great_ADJ Lakes

de, de las, etc is treated as PROPN instead of tagging using the foreign words

anyway, that was a brief check of all PROPN in PUD to look for ones which are ADJ or possibly other, although I may have missed some demonyms.

amir-zeldes commented 2 months ago

Something I don't like in GUM: Latin_PROPN America vs Latin_ADJ Club. Seems inconsistent.

Agreed, will fix

There's a distinction being made with Great in GUM: Peter the Great_PROPN vs Great_ADJ Republic.

TBH I think they should both be ADJ, can fix