UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
271 stars 245 forks source link

Coordinated prepositions and Case information in enhanced dependencies #854

Closed olesar closed 2 years ago

olesar commented 2 years ago

In patterns such as ADP and ADP NOUN, coordinated prepositions can have different case government. In the sentence

be. Людзі разьбягаюцца ў двары перад і падчас атакі 'People run away in the yard before and during the attack'

перад 'before' is used with Instrumental, whereas падчас 'during' governs Genitive, and атакі is in the Genitive case.

According to the UD guidelines, only the first preposition depends on the noun, case(атакі, перад) (case(attack, before)) which results in a weird combination of the preposition and morphological case of the noun in the EUD label, i.e. incorrect in terms of the validation rules for enhanced relations and grammatical rules of the language in general (obl:перад:gen).

I do not think that adding this 'weird combination' to the EUD validation rules is a reasonable solution. Any suggestions?

nschneid commented 2 years ago

Seems related to an open issue about coordinated auxes: amir-zeldes/gum#107

sylvainkahane commented 2 years ago

It is one of the reasons why we changed the annotation of functional heads, as well as coordination, in SUD.

  1. In SUD, ADPs are heads of adpositional phrases.
  2. In SUD, shared dependents on the right are attached to the right conjunct. See discussion in the last paper on SUD presented at Syntaxtfest 2021.

You can compare the analysis of coordinations of ADP here. The request is on a UD treebank. Every UD treebank is aligned with SUD treebank build by conversion: you can select the SUD tree under the UD tree of each example. As you will see the analysis is in accordance with your data (and similar data in other languages).

amir-zeldes commented 2 years ago

This advantage of SUD is true, but such constructions are rare and there is a price - especially the different treatment of PPs in languages where the case marker is optional/often omitted (e.g. Japanese), and even in English, different treatment of being "at home" vs. "home". The lexical part of a PP is the main thing that can't be omitted, and it's the thing that is parallel to adverbs (makes sense if we consider PPs and ADV as both adverbials). Cross-linguistically, the UD analysis means we are easily able to align the root of a PP predicate to a language where the predicate is expressed as an ADV or case-marked noun without an adposition. So similarly to the NP-dominates-determiners analysis, there are pretty motivated reasons for treating prepositions as case markers.

sylvainkahane commented 2 years ago

The lexical part of a PP is the main thing that can't be omitted, and it's the thing that is parallel to adverbs (makes sense if we consider PPs and ADV as both adverbials).

It is rather the adposition that can't be omitted in many cases:

I didn't see you since Syntaxfest
I didn't see you since
*I didn't see you Syntaxfest

and we can defend that adpositions are transitive adverbs, some of them having optional objects, such as since or before.

It is true that case markers are sometimes optional in Japanese but maybe it is because there are case markers and not adpositions.

I don't want to say that UD analysis has no value. It is just another point of view on the structure, more semantic than the surface-syntactic analysis of traditional dependency grammar, X-bar syntax or SUD. It is particularly interesting for subcategorized adpositions, which have only a syntactic role. I find it much less motivated for meaningful adpositions such as since or before and even very problematic for adpositional idioms such on top of, in spite of

dan-zeman commented 2 years ago

To return to @olesar 's original question: I would definitely not combine the preposition with the wrong case in the relation label. It means that rule-based extraction of the subtypes from the basic trees is less straightforward but I would not see that as a drawback – if the case-enhanced labels are not just blindly copied lemmas from the nearest case child, then the value added by the enhanced representation is higher.

In fact, there are two different types of relations between the verb and the oblique modifier. One of them is obl:перад:ins, the other is obl:падчас:gen. You could decide that if there are coordinated prepositions, you will only pick the last one, which is most likely to match the morphological case of the nominal. However, I would argue that you can actually put both relations in the enhanced graph. The examples in the guidelines do not show (yet) anything like that, but perhaps they should. The validator should not object, as long as the labels of the two relations between the same pair of nodes are not identical. Nevertheless, it may not be always straightforward to guess the morphological case belonging to the first preposition. I suppose that перад will allow either instrumental or accusative, so a simple table of prepositions would not always help.

amir-zeldes commented 2 years ago

I agree with @dan-zeman - just because the overt form of the noun can only be one of two cases doesn't mean we don't underlyingly have both government relations (which is what EUD should express IMO). But indeed, it may be challenging to do this automatically using rules, and the basic graph needs to have a consistent behavior.

@sylvainkahane - you'll get no argument from me, multiple analyses are great to have an expose different aspects of each language's structure. I'm not sure I would want to treat "since" without an argument as a preposition, but in fact it seems you also feel more like it's the other way around, at least in English: most prepositions are something like transitive adverbs (this is also historically true in Indo-European, but seems less compelling for languages like Japanese).

I think the optionality of adpositions is still much more frequent than that of the lexical arguments, and that optionality in languages like Japanese doesn't behave like 'case' in the sense of European synthetic languages, where inflection is generally not optional and seen as part of the word/token. I think the closest we have in English is things like "at home" vs. "home" or "next week" vs. "in the coming week", which all line up graph-wise in the more lexico-centric UD (the same is true for the utility of copulas as dependents for languages with zero/filled copula alternations, like in Afro-Asiatic or Slavic languages).

nschneid commented 2 years ago

I think there's no question why Stanford Dependencies, designed for English, had prepositions as heads: apart from "home" and some temporal adverbial NPs they are pretty much mandatory in English. But the ship has sailed in terms of UD prioritizing content heads to favor crosslinguistic parallelism. It's nice to have SUD as an alternative.