UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

DET coerced to ADV? #432

Open nschneid opened 11 months ago

nschneid commented 11 months ago

https://universal.grew.fr/?custom=650f8fee11c87

nschneid commented 10 months ago
nschneid commented 10 months ago
  • (he) complained some

This is modifying a verb to indicate degree. And "much" and "any" are also possible ("He complained too much", "He didn't complain any"). Roughly speaking, "some" = "a little", "any" = "at all", "much" = "a lot". "Some" and "any" sound a bit colloquial. [Incidentally, plain "much" seems to work better in nonveridical contexts: Did you travel much? No we didn't travel much. ??Yes we traveled much.]

Because "some", "any", "much" are all usually prenominal quantifiers (DET or ADJ), there is a pattern of these being extended to event intensity markers, so a case could be made for keeping the more canonical tag even in the adverbial context (perhaps attaching as obl:npmod). But it seems EWT uses ADV across the board, and GUM does too though only "much" occurs in this construction.

Note there is also a construction where prenominal modifier "some" occurs with a number, e.g. "It will cost some ten dollars." EWT uses DET here. GUM is inconsistent between DET and ADV. We should have a consistent policy.

amir-zeldes commented 10 months ago

Note there is also a construction where prenominal modifier "some" occurs with a number, e.g. "It will cost some ten dollars."

I vote RB/ADV attached to the number. Can fix the two GUM errors. Oh wait, only one of them is an error - "some two things" means "some particular two things" in the second hit, not "approximately two things". The first DET example is wrong, will fix.

nschneid commented 10 months ago

Intuitively I don't have a preference, but WSJ has a strong preference for DT (38 to 7). In ON5_dep it's 67 to 9. All 14 EWT tokens are DT. I don't see any mention in the Penn tag guidelines.

This reminds me of "other than" (#275)—how much should we lean on de facto Penn policy for the XPOS, and to what extent is XPOS tied to UPOS if we think Penn's policy isn't a great fit for UD?

amir-zeldes commented 10 months ago

how much should we lean on de facto Penn policy for the XPOS, and to what extent is XPOS tied to UPOS if we think Penn's policy isn't a great fit for UD?

It is a bit surprising given the tag RB for "that" in "that big" - I would have expected the substitutability with "approximately" to favor RB here. But with ON/PTB so heavily skewed to DT, maybe we should just follow suit... I agree it's not a great fit for UD. If we keep deprel advmod, we'd be forced to have the UPOS - XPOS discrepancy.

nschneid commented 10 months ago

Could we just treat it as DET/det? "some/det 10/nummod dollars" along the lines of "the 10 dollars"

Though modifying the number makes sense semantically I'm not 100% sure "some 10" is a constituent (it gets dicey with these prenominal approximators).

amir-zeldes commented 10 months ago

Mm, it's not totally out of the question from my perspective, but I do find it a little sad to throw out the distinction. After all, "some 20 letters" is ambiguous: in the 'true' det reading, it means "any non-specific set of exactly 20 letters", and in the reading we're discussing here it can mean "a specific set of approximately 20 letters". I think the truth here is probably that 'some' does modify the number.