Closed yolpsoftware closed 2 years ago
Shouldn't the German day-of-month ordinal numbers be treated as nouns too? What's the difference to the English case?
They should be treated similarly. However, the current annotation in English is wrong. They are definitely not nouns. Ordinal numerals are generally tagged as adjectives, with the additional feature NumType=Ord
(see ADJ).
The inconsistencies in German should now be fixed in the dev
branch. The fixes will be propagated to the next UD release.
There seem to be some inconsistencies in the handling of ordinal numbers. Some ordinal numbers are lemmatized as an adverb with the period (
word="21.", lemma="21.", pos=ADV
), some as an adverb without the period (word="21.", lemma="21", pos=ADV
), and some split the number and the period, treating them as NUM and PUNCT. Just looking at the "dev" dataset:dev-s576
,dev-s585
,dev-s607
,dev-s609
,dev-s610
,dev-s611
,dev-s637
treat the ordinal number asword="21.", lemma="21", pos=ADV
.dev-s528
,dev-s566
,dev-s621
treat the ordinal number asword="21.", lemma="21.", pos=ADV
.dev-s29
,dev-s511
,dev-s461
treat the ordinal number as two words:word="21", lemma="21", pos=NUM
andword=".", lemma=".", pos=PUNCT
Furthermore, the days of months situation is IMHO very similar to the English case:
so I would expect them to have the same treatment. Both are dates, and in both cases, the day is an ordinal number meaning "the 27th day of May".
However, in the following English dataset, days of months seem to be lemmatized consistently as NOUN (just search the dataset for "1st", "2nd", "3rd" etc.):
https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu
Examples:
Shouldn't the German day-of-month ordinal numbers be treated as nouns too? What's the difference to the English case?