UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
266 stars 243 forks source link

Double case analysis for prepositional expressions like "out of" #795

Open nschneid opened 3 years ago

nschneid commented 3 years ago

A recurring question, most recently in amir-zeldes/gum#88, has been the "double case" analysis used in English-EWT: for "out of NOMINAL", "out" and "of" each attach to NOMINAL as sister case dependents.

The English-specific fixed guidelines list "because of", "instead of", etc. as joined by fixed, but an exception is carved out for spatial relations:

out of, off of (All double prepositions denoting spatial relations are annotated with two cases on the nominal)

Other types occurring in EWT are "inside of", "outside of", and "ahead of".

I have doubts about this policy, however. It would be one thing if the two prepositions both bore spatial meanings and were freely combined in the bigram, as is perhaps the case with "out from", "off from", "up from", "away from", etc. But "of" is more restricted than "from" in its spatial use:

So even the spatial combinations with "of" seem highly specialized, suggesting fixed. This is what GUM uses.

Secondly, the "double prepositions" language suggests that the two words are both tagged ADP, but there are some questions about the tagging of the first word in other spatial expressions that are arguably fixed (but are not documented)—"along with" (amir-zeldes/gum#88), "next to" (#496).

Thirdly, EWT currently uses double case even for "based on" (UniversalDependencies/UD_English-EWT/issues/179) and "except for", which are not spatial and therefore should qualify as fixed, I think, though they are not listed in the documentation.

Are there any objections to dropping the spatial test for the double case analysis and simply developing a more extensive list of lexicalized prepositional expressions that should be fixed?

N.B.: Related to double case , idiomatic expressions such as "based on", "as though", "along with", and "as if" occur in EWT with double mark; "as if" is specifically documented as fixed, so the annotation seems incorrect, while the others are not documented.

nschneid commented 3 years ago

Another argument against treating "of" as productively combining is a coordination test:

It's possible that some English speaker at some point in time would have repeated "of" in the second coordinate, but to me it sounds decidedly forced.

Contrast "near to" and "next to" in #496.

amir-zeldes commented 3 years ago

All of this suggests to me that fixed would be better for "out of", and indeed this is the current analysis in GUM. It's also nice and parallel with single case "outta".

nschneid commented 9 months ago

Another thing to keep in mind is what happens when these multi-word items are stranded, as in "a horse...that I won't grow out of".

dan-zeman commented 7 months ago

Has this been projected to the list of English fixed expressions? Can we close the issue?

nschneid commented 7 months ago

I think we need to discuss this more (along with the general principles for how we decide what is fixed).

amir-zeldes commented 7 months ago

I cant remember when/why I got this idea, but in my mind we had already admitted "out of" (maybe due to the existence of outta? and the stranding argument). It's been in the GUM fixed list for a while:

https://wiki.gucorpling.org/gum/dependencies