UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
200 stars 42 forks source link

at/ADV #425

Closed nschneid closed 11 months ago

nschneid commented 1 year ago

Looking at relatively rare tags for a lemma, I find "at" tagged as ADV in the expressions:

I assume these should all be changed to ADP.

"at least" is documented as fixed for non-quantities, but I don't see why that should affect the word-level tag.

amir-zeldes commented 1 year ago

The GUM cases are all xpos=IN. The only reason they are ADV is the upos converter changing ADP to ADV for multiword advmod + fixed, which someone at some point must have said is the right thing to do. I don't feel passionately aobut it and could revert that to ADP, but as always, I worry about how much we change upos at the drop of a hat... (basic reason why I only really use xpos in eng.)

nschneid commented 1 year ago

Yeah this is probably a residual treatment of multiwords as complex ADVs but I don't think we do that elsewhere. The UPOS needs to be documented in the fixed list (#424). That list already has case for "at best" and "at worst", and I don't see any logic to calling "at" an ADV if it's attaching as case. I think we should just make them all ADP.

amir-zeldes commented 1 year ago

Hm, I really don't care about this much (which again is a bad sign for upos...), but this has evidently been around since UD 2.4:

https://github.com/amir-zeldes/gum/blob/master/_build/utils/upos.ini#L63

Which is when I started versioning the rules a bit so we could track what changed about the xpos<>upos mapping. I'm not saying we shouldn't change it, we can, but I feel like the time has really come to decide that English upos is stable, and after that point, to say 'no, we won't change things even for moderately good reasons - we picked one of the many bad options for English POS tags and we are sticking to it'. Until we do that, I feel like it's not very useful.

nschneid commented 1 year ago

I feel like none of us really prioritized ironing out all the details of UPOS for English, which makes it easy to find inconsistencies. Still tons of things involving SYM and so on that differ between GUM and EWT. Anyway...for these expressions we should pick a standard and document it.

nschneid commented 1 year ago

fixed in EWT

nschneid commented 1 year ago

Related issue on "at all" #18

amir-zeldes commented 11 months ago

Fixed in GUM as well