UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

Morph. features are alphabetized ignoring case? #460

Closed AngledLuffa closed 11 months ago

AngledLuffa commented 11 months ago

In Python we would get:

>>> "Number" < "NumForm"
False
>>> "number" < "numform"
True

whereas in EWT we are sorting features like this:

12      1990s   1990    NOUN    NNS     Number=Plur|NumForm=Digit|NumType=Card  4       nmod    4:nmod:during   _
nschneid commented 11 months ago

The UD validator in https://github.com/UniversalDependencies/tools/ enforces alphabetical order of features.

AngledLuffa commented 11 months ago

Right, but my question is, are we ignoring case? In Python and Java, NumForm comes before Number unless we specifically ignore case

nschneid commented 11 months ago

We are doing whatever the validator does. Apparently yes, it is ignoring case, otherwise the treebank would trigger an error.