amir-zeldes / gum

Repository for the Georgetown University Multilayer Corpus (GUM)
https://gucorpling.org/gum/
Other
88 stars 50 forks source link

Ordinal Superlative construction ("[3rd tallest] building") #113

Closed nschneid closed 12 months ago

nschneid commented 2 years ago

http://match.grew.fr/?corpus=UD_English-GUM@dev&custom=6221941141b83&clustering=X.upos reveals inconsistent treatment of both UPOS and deprels.

The ADJ guidelines specify that the ordinal in these cases should be tagged as ADJ despite modifying another adjective (presumably as advmod).

amir-zeldes commented 2 years ago

I can make these cases consistent, but it looks like we both agree the correct deprel is advmod, which leads to the question of why tag them as ADJ and not ADV? If something is "third biggest", then that describes in "in what way is it big" or "how big is it?", which for me means it's an adverb (interrogable by "how", so manner or extent in this case).

Can we move to amend this guideline? I'm happy to make them all ADV + advmod.

nschneid commented 2 years ago

It's a case of productive extension—any ordinal number can be used in this construction; does that make it zero-derivation of ADV? I don't necessarily have a strong opinion but before changing the guideline we would need to hear why it was written that way.

nschneid commented 2 years ago

Oh also this construction doesn't just modify adjectives:

So this would be amod(apples, third)?

amir-zeldes commented 2 years ago

Ever since the UD validator has put such an emphasis on equating advmod with ADV, it's been my understanding that adverbially used (morphological) adjectives should also be tagged ADV. This seems especially straightforward for English, since many morphologically unmarked items are regularly ADVs ("do something quick/ADV"), so I don't see the motivation for ADJ here in particular (it's not like we don't assume zero derivation for things like doing something quick, nice, fast etc.)

For "third most apples" I would have done advmod(most, third). I think the amod reading off the noun would mean something like there being three "most apples" instances, of which Sam is the third. So something like "Sam has (won the) third (iteration of the) most apples (award)". If it limits the scope of it being "most" (not absolutely most but third most), then it should be a child of "most".

Either way I'm curious what @dan-zeman and others think about this.

dan-zeman commented 2 years ago

The validator will not complain if it encounters an ADJ attached as advmod. The validator mainly wants to avoid NOUN+advmod (because nouns should be obl instead), and VERB+advmod (because those should be advcl instead).

If I understand correctly what the construction is supposed to mean, then I think that third should be attached to most and not to apples. Then advmod is probably more expected than amod, although I don't feel strongly about it. But I wouldn't change the tag of third from ADJ to ADV just because it occurs in such a construction.

amir-zeldes commented 2 years ago

I wouldn't change the tag of third from ADJ to ADV just because it occurs in such a construction.

We're agreed on the attachment, but this part surprised me - if functioning as an adverb (advmod) is separate from being morphologically an adverb (ADV), then why not accept NOUN+advmod too? The reason we don't attach these as just obl is that they are unmediated (look like objects in "I ran three hours"), so as a compromise we have subtypes like :npmod, :tmod etc., inherited from Stanford Dependencies. But if being adverbial is just a function, we could have tagged them as advmod with non-ADV pos as well, so this seems inconsistent.

Would you also tag the following as adjectives?

dan-zeman commented 2 years ago

I wouldn't change the tag of third from ADJ to ADV just because it occurs in such a construction.

We're agreed on the attachment, but this part surprised me - if functioning as an adverb (advmod) is separate from being morphologically an adverb (ADV), then why not accept NOUN+advmod too? ... But if being adverbial is just a function, ...

Because nominals and modifier words are different categories in the top-level UD taxonomy. Adjectives and adverbs are both modifier words, so I see at least some room for debate. But nouns are nominals, hence no advmod is allowed for them. Think of obl as the label for "being adverbial" that is used with nominals.

Would you also tag the following as adjectives?

Maybe... or maybe not. It depends on how you want to define adverbs in English. That has been a mystery to me ever since I learned that the -ly suffix is not obligatory.

nschneid commented 2 years ago

I see no need to reinvent the wheel on English ADJ vs. ADV. If a word like "cheap" or "long" could be replaced by "carefully" but not "careful", it should be ADV.

Regarding ordinal numbers, PTB says always ADJ, so it seems easiest to stick with that:

image

This construction is special, which is why they needed to mention it (and we should document it), but I think advmod(largest/ADJ, fourth/ADJ) is an acceptable option.

amir-zeldes commented 2 years ago

it should be ADV.

+1 !

Regarding ordinal numbers, PTB says always ADJ, so it seems easiest to stick with that

The first part of that image is curious and not in line with the data (see below), but I think you're misreading the second guideline: it says "compounds of the form fourth-largest", but you need to keep in mind that these were not tokenized apart in the original PTB, so they are just saying the whole thing (headed by "largest") is an adjective. If you look at OntoNotes, which contains the re-tokenized PTB and which I take to be the successor of PTB, you will see that a majority of cases tags the modifier as RB (admittedly it's 26:17, so not a huge majority), including in WSJ:

And similarly in the newer genres added by ON:

I think the "substitution by -ly" test suggests that things like sentence initial ordinals ("First, ..." = "Firstly", "Second" = "Secondly") should be tagged as ADV as well, and again ON backs this up:

A query for "First ," and "Second ," shows the skew here is much stronger, with 91:13 in favor of RB (plus 5 cases of LS, oddly, even though it's spelled out as a word!). I don't think ordinals should be given a unique analysis when they fit the same normal ADV distribution tests as regular adverbs, and though I might have agreed if there was a huge precedent for doing this for consistency reasons, it seems ON doesn't do it either.

nschneid commented 2 years ago

Re: ADJ vs. ADV generally, I was pointed to this paper which points out, for example, that adverbs can be postmodifiers of nouns ("his announcement recently that he would resign"). That and other constructions (adjectival compounds, etc.) are used to argue that the distinction cannot be made purely based on what is being modified.

In UD terms, "his announcement recently" is especially awkward because the adverbial vs. adnominal distinction is baked into the deprel, not just the POS.