amir-zeldes / gum

Repository for the Georgetown University Multilayer Corpus (GUM)
https://gucorpling.org/gum/
Other
88 stars 50 forks source link

Suspicious: rightward NOUN -compound-> NOUN #81

Closed nschneid closed 12 months ago

nschneid commented 3 years ago

http://match.grew.fr/?corpus=UD_English-GUM@2.7&custom=6028645a81fa8

amir-zeldes commented 3 years ago

Thanks for reporting! The second hit is definitely an error, which I just corrected. For the first one I think compound is right: it's just heavy because of the coordination, so it gets extraposed to the right (it would normally be "general purpose operators", which I would analyze as "general <-amod- purpose <-compound- operators)

The third one I'm more unsure about. There is a reading that could be taken as compound, which is to say:

This is maybe the most naive/straight forward reading. Another reading can make it an implicit/elliptical PP, something like "the morning (in) US time". In that case I would go with nmod:tmod. Do you think that's better here? Semantically I feel the compound reading is about right, but compounds can specify any semantics really, so that's not surprising. Syntactically the junction marker here is zero, so again it's hard to tell apart compound from tmod.

nschneid commented 3 years ago

For the first one I think compound is right: it's just heavy because of the coordination, so it gets extraposed to the right (it would normally be "general purpose operators", which I would analyze as "general <-amod- purpose <-compound- operators)

Hmm...the sentence is:

I know rightward extraposition happens with adjective phrases, and the phrasal modifier "general purpose" arguably functions like an adjective (it is even coordinated with an adjective). Could you say something similar with unambiguously noun modifiers:

If I heard that I might be inclined to interpret the second part as parataxis ("high-quality soups, (specifically) both vegetable (ones) and chicken noodle (ones)").

(I am reminded of the Major-General's Song—"I've information vegetable, animal, and mineral"—but that is hardly normal English!)

The third one I'm more unsure about. There is a reading that could be taken as compound, which is to say:

  • the morning US time = the US time morning

This is maybe the most naive/straight forward reading. Another reading can make it an implicit/elliptical PP, something like "the morning (in) US time". In that case I would go with nmod:tmod. Do you think that's better here? Semantically I feel the compound reading is about right, but compounds can specify any semantics really, so that's not surprising. Syntactically the junction marker here is zero, so again it's hard to tell apart compound from tmod.

"The U.S. time morning" and "the EST morning" sound pretty odd to me. I like nmod:tmod.

amir-zeldes commented 3 years ago

OK, I think nmod:tmod is reasonable, and certainly not worse than compound here, so I'll go for that. For "general purpose operators", I think I would still treat that as compound, at least as long as we're tagging purpose as NN. I agree the whole construction is semantically very ADJ-like, but morphologically "purpose" is still a noun. Unlike coordination between compound and amod is actually pretty common, for example from GUM:

compound+amod:

amod+compound:

etc. So I think that doesn't have to speak against compound (but in edeps I agree the 'specific' in this case should be amod, so we'd need some cleverer edep conj propagation here)

nschneid commented 3 years ago

compound-amod coordination only reinforces my hunch that the deprel should be simply mod, without distinguishing the two. :)

But I can live with rightward compound in this narrow circumstance.

amir-zeldes commented 3 years ago

The two issues are unrelated I think - the rightwardness is just a result of extraposition (could happen to a variety of modifier deprels), whereas the compound vs. amod distinction is tricky because English morphology is so unmarked that noun vs. adj modifiers now look very similar, whereas in older language stages it was very clear (as in German).

I think mod is not a good idea for a number of reasons, not least because the distinction serves downstream tasks like entity recognition: I take amod to be incapable of introducing an entity, whereas compound can. Unlike-coordination happens in a lot of constructions (so much so that PTB has a label for it), but I don't think it means that the two conjuncts are therefore the same deprel - for example you can coordinate and extrapose amod with advcl ("a painting most beautiful and like I had never seen before"), but I wouldn't say I want to call them both 'mod' just because of that ;)