UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

Do we need the det:predet subtype? #413

Open nschneid opened 1 year ago

nschneid commented 1 year ago

Nearly all of these are essentially det before a determinery thing (possibly nummod or nmod:poss).

The exceptional cases, some of which are either dysfluencies or annotation errors: EWT, GUM

This seems like a minor construction. If it were used in a bunch of other languages that would be one thing, but it seems to be only a few.

There are some non-predeterminer instances of multiple det dependents: dysfluencies, one instance of a PP used as an NP ("a little behind the scenes"), an article followed by a title that includes a determiner...but these are extremely rare. det:predet is essentially equivalent to det before det|nummod|nmod:poss, or det with one of those kinds of things promoted to head of the nominal. So maybe a good place to simplify.

amir-zeldes commented 1 year ago

Fixed GUM errors, thanks!

I'm a little conflicted about the label, because I agree predet is not really a grammatical function. On the other hand, as you know I'm very much into 'not rocking the boat' - there are thousands of users of UD, and we may just be ruining the day of some researchers, grad students etc. somewhere by messing with the label set for no strong reason. I guess it's only there because having two dets was offputting to some people, so same as the reason for PDT basically (in a DP-based theory maybe there would be deeper reasons, but UD is NP-leaning).

So yeah, if I were designing UD English from the ground up I would leave it out, but since it's not hurting anyone and has been stable for over a decade, I would just leave it alone, at least until V3 where we could change lots of things.