UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

orphan: ellipsis in first conjunct #522

Closed kimgerdes closed 6 years ago

kimgerdes commented 6 years ago

I wonder how to preserve the obligatory right branching of the conj relation in case of an ellipsis in the first conjunct.

I see two possible analyses, which one should be chosen?

  1. The first analysis follows the idea of ellipsis: The verb isn't present and the first argument (er) becomes the new head (with aux and advmod relations). The good: projective, natural phrases (?) The ugly: verb-phrase headed by a pronoun screenshot-2018-1-11 arborator quickedit 1

  2. The second analysis forces the second conjunct to be analyzed as orphaned although it is right next to the head. screenshot-2018-1-11 arborator quickedit 2

The good: first conjunct complete, second has to be reconstructed The ugly: non-projective, unnatural phrases (?)

If you have a better idea, you can modify my proposed analyses here

jnivre commented 6 years ago

We generally promote auxiliaries when main verbs are elided, so an alternative would be to let "hat" be the root of the first clause "angerufen" the root of the second. This would avoid the use of the orphan relation completely.

dan-zeman commented 6 years ago

One could also say that "hat angerufen" belongs to the first conjunct and the position of "angerufen" is the result of the German "bracketing" word order, not because it has more to do with the second conjunct. I think it would be my preferred perception of the sentence. But it would have strange consequences for our conj guidelines: now the head of the first conjunct would appear to the right of the second conjunct, and conj would be no longer right-branching. Since this seems to be too wild (and also impractical for automatic testing of guidelines conformity), Joakim's proposal is probably better.

kimgerdes commented 6 years ago

Joakim's proposal doesn't work for subordinates (or similar cases in any verb final language): screenshot-2018-1-12 arborator quickedit

Here's Dan's reversed conj proposal (if I understood correctly): screenshot-2018-1-12 arborator quickedit 1 The good: projective, natural phrases (?) The ugly: reversed conj relation

So which one should we prefer? Or maybe we should first look at examples from other verb-final languages?

dan-zeman commented 6 years ago

My proposal (yes, it would look like Kim's second tree in the previous post) was primarily motivated by the unwillingness to separate hat from angerufen; that would not be a problem in subordinates. German is not a verb-final language in general. When preparing v2, we had a lengthy discussion about allowing each language to select conj direction (but then the language would have to use it everywhere). At the end of the day, the proposal was rejected.

In the discussion, we also considered examples of first-conjunct gapping from verb-final languages, in particular Uyghur. Since the guidelines do not allow right-to-left conj, such languages have to use orphan in the first conjunct, and the verb in the second conjunct will be attached via conj to the promoted argument of the first conjunct.

kimgerdes commented 6 years ago

So we follow the principles that Dan recalled:

  1. orphan is used where the head is furthest away
  2. conj has to keep going from left to right

We conclude that this for subordinates this is the correct UD analysis: screenshot-2018-1-12 arborator quickedit screenshot-2018-1-12 arborator quickedit 1

This means, however, that for verb-final languages, the root of the coordination does not point to the main verb of the construction.

Remaining problems:

  1. where to attach the marker (dass)?
  2. where to attach other modifiers (gleich, dass er gleich seine Frau und sie ihre Mutter angerufen hat)?
  3. does Joakim's proposal in case of the presence of an auxiliary in the first conjunct still hold? This would make the V2 aux cases very different from all the other cases.

You can modify the subordinate German sentence here and the Korean sentence here

jnivre commented 6 years ago

I agree that these are the analyses that are most consistent with out current principles. I also agree that it is unsatisfactory not to have a verb as the root despite the fact that there is a verb available. Maybe we need to consider exceptions to the rule that conj relations are always attached to the left. But these would open a big can of worms, so it must be considered carefully ... :)

kimgerdes commented 6 years ago

Ok, so let's just conclude by clarifying the remaining points:

  1. where to attach the marker (dass)?
  2. where to attach other modifiers (gleich in "dass er gleich seine Frau und sie ihre Mutter angerufen hat")?
  3. how about the names of these links from the "replacement governor" (here "er" or "그는") to its dependents (dass, gleich, ...)? are they all orphan or do we keep their "natural" names?
  4. does Joakim's proposal in case of the presence of an auxiliary in the first conjunct still hold? This would make the V2 aux cases very different from all the other cases. But why not. Maybe it's more consistent with other cases where the auxiliary is promoted to be the main verb.
  5. Do orphan links have to go from left to right? Are they preferably from left to right (in case there are two candidates)? If not, how to decide who's the head in simple gapping in English like "he likes coffee and she <->? tea". I found a few orphans from right to left but maybe they are just errors.

thanks!

jnivre commented 6 years ago

1-2. The marker and the adverb should attach to the root of the clause, hence ”er” in this case.

  1. For the marker, it is definitely fine to use the ordinary label (mark). The adverb is borderline, but I would be willing to accept advmod, I think.

  2. I would say yes, but I may not have thought through the consequences.

  3. Definitely no. Since we decided to use obliqueness and not linear order to determine the root of the gapped clause, we have to accept any directions.

kimgerdes commented 6 years ago

Ok. Perfect. I try to resume:

  1. In case of gapping, the highest element in the obliqueness order is promoted to be the head of the gapped clause.
  2. conj always goes from left to right, even in case of left conjunct gapping. This means that in verb final constructions, the head of the sentence is no longer the verb but the promoted head of the gapped phrase.
  3. Relations that also appear in the complete conjunct are now orphan to the root of the gapped clause.
  4. Dependents that are specific to the gapped clause keep the usual relation name.
  5. In case the gapped conjunct has another verbal element, for example an auxiliary, an alternative analysis is (at least temporarily) acceptable where the auxiliary is analyzed as the main verb.

The German example then gives: screenshot-2018-1-13 arborator quickedit

If both clauses had the same modifier, they would become orphan in the gapped clause: screenshot-2018-1-13 arborator quickedit 3

If we agree I could integrate the additional specifications somewhere in the guide, possibly to the orphan page or the ellipsis section?

dan-zeman commented 6 years ago

In general I agree. I am only not so sure that the distinction between 3. and 4. is defined strictly this way (although this seems to be a reasonable approximation). In fact, I don't think there is such a detailed definition at all. We have agreed that function dependents such as mark or cc should not become orphan just because they are attached to the promoted argument/adjunct. We did not say exactly what else (if anything) should be exempt from “orphanization”. Personally I am slightly in favor of labeling gleich as an orphan as well.

jnivre commented 6 years ago

I agree with Dan. The relevant criterion is not whether the function appears in the non-gapped clause (although there is often a correlation). Functional relations like cc and mark remain as they are. Core relations definitely are replaced by orphan. Relations like obl, advmod and advcl are borderline. I think the guidelines say that orphan should be used only for core arguments, but I think this was too restrictive so maybe we should say core arguments and modifiers.

dan-zeman commented 6 years ago

At present, http://universaldependencies.org/u/dep/orphan.html says that core arguments are a typical example (that is, not the only possibility). http://universaldependencies.org/u/overview/specific-syntax.html#ellipsis sounds as if orphan were used only with core arguments. However, we have shown with @Kira-D in the UDW paper that the orphan relation is needed (and indeed used in some treebanks) also for oblique arguments and adjuncts, as in

She flew to Berlin yesterday and to London today.

In some languages orphan is also used in comparatives. Therefore, I think we should modify the wording of the guidelines to sound less restrictive.

sebschu commented 6 years ago

I also think that only markers and conjunctions should be attached with the respective relation and everything else, including modifiers, should be attached using orphan (that's also what I've implemented in the English treebank). The orphan relation is as Joakim said not for indicating whether something should be shared across conjuncts or not but instead, for signaling that the orphan dependent is actually not a dependent of its governor (and that it actually depends on something that was elided).

sebschu commented 6 years ago

Regarding the sentence with the V2-auxiliary: I was also struggling with these at some point and didn't come up with a satisfying solution. I agree with Dan that the analysis that does not separate the aux and the main verb is the most sensible one despite the flipped conj relation. So at the risk of opening a can of worms, I'd actually be inclined to allow the conj relation from right to left in this case.

image

The main argument in the v2 discussion was that we wanted the order of conjuncts to be the same across languages and therefore, we decided that conj should always go from left to right. But this principle wouldn't be violated if we allowed a conj relation from the main verb to the subject in the second conjunct, so I think we could allow this exception without restarting the debate of whether verb-final languages should have the opposite conj order.

Note that if we added this exception, this wouldn't change anything for languages like Korean or Japanese, which I think is fine because the conjunct-internal structure is exactly the same as in languages in which the first conjunct contains the verb.

kimgerdes commented 6 years ago

We have essentially two choices here:

  1. constituent-centered approach: the constituent should be a subtree, the two subtree of coordination are connected by the conj link. Then we can decide to always make conj go from left to right, and orphans are always in the incomplete constituent.
  2. dependency-centered approach: the dependency tree should be the same, independently of word-order constraints. This is Sebastian's approach.

In Sebastian's approach, I don't understand, why we wouldn't then do the same thing for Korean.

Sylvain and I prefer the constituent-centered solution, in particular because the actual unordered dependency tree is not that different for different word orders. If you see the conj link as a horizontal paradigmatic link, we have these two similar structures:

V X Y & X' Y' (he gave me tea and her coffee)

  V -conj- X'
 / \       |
X   Y      Y'

X Y & X' Y' V (dass er mir Tee und ihr Kaffee gegeben hat)

X -conj- V
|       / \
Y      X'  Y'

The first tree we will anchor at the V node, the second tree, we will anchor at the X node, but apart from that, it's the same structure.

The rule in the case would be that in case of head-gapping, all syntactic dependents would be connected by orphan to the newly promoted head. Only functional elements (that are in fact functional heads) will keep the usual relation: aux,cop, case, mark (and of course also cc)

jnivre commented 6 years ago

I also support the constituent-cantered approach here. Sebastian's analysis looks superficially appealing in that it has many dependency relations between words that look like plausible head-dependent relations, but the price to pay is that it completely distorts the structure of the coordination, essentially embedding one conjunct inside the other. This price is too high, in my opinion.

sebschu commented 6 years ago

I gave this some more thought and I actually agree with you. Like Dan, I was initially really uncomfortable with separating the auxiliary and the main verb -- in part because one cannot separate them across sentences and a discourse such as the following is not possible:

A: Er hat gleich seine Frau. (He has immediately his wife) B: Und sie ihre Mutter angerufen. (And she her mother called)

But at the same time, I think one could insert an auxiliary in the second conjunct (it sounds a bit weird but I'm pretty sure people do this):

Er hat gleich seine Frau und sie hat ihre Mutter angerufen (He has immediately his wife and she has her mother called)

I think this can be seen as an argument for promoting the auxiliary in the first conjunct, and in combination with not breaking all of our other conventions regarding the directionality of the conjunct relation for coordinated clauses, it seems smart to stick with the constituent-centered approach.

kimgerdes commented 6 years ago

Cool. Thanks to all of you for the debate. It's much clearer now for me.

gossebouma commented 6 years ago

Our Dutch treebank (the underlying data for the Dutch UD treebanks) certainly has cases @sebschu thinks are marginally acceptable (ie with gapped main verb in 1st conjuct):

Van dit totaal zou de VS 95 miljard en de EU 106 miljard dollar voor haar rekening nemen .

(of this total should the US 95 billion and the EU 106 billion dollar "pay")

volgens Ante Pavelić moest " 1/3 geassimileerd , 1/3 verdreven , en 1/3 vernietigd worden " (according to pavelic should 1/3 assimilated 1/3 expelled and 1/3 destroyed be)

I am currently not converting these cases to UD in the right way, hope to make some progress in the near future

gossebouma commented 6 years ago

I notice now that I missed the bit about repeating the aux in both conjucts while omitting the main verb in the first conjuct. These seem rare, but I found some in the corpus of spoken dutch:

ge moogt xxx niet tegen de wand en je moogt ook niet tegen uw eigen aankomen

you may not against the wall and you also may not with yourself contact

en dan zijn we een stukske langs 't kanaal ... en dan zijn we hier naar Leut gefietst

and then we have a bit along the canal ... and then we have here to Leut biked

kimgerdes commented 6 years ago

It is very interesting that you bring up spoken language because the reason we got interested in this was actually not coordination but rather elaboration and similar structure which are very common in spoken data.

I guess from this discussion it evolved to treat these cases as usual head-gapping in the first conjunct. This means like the first analysis here: screenshot-2018-1-19 arborator quickedit 1

The rule that I tried to follow (correct me if I'm wrong): promotion of highest argument to the new head, orphan for all syntactic arguments, preservation of all links to functional dependents (that are actually heads in a traditional dependency analysis): aux,cop, case, mark (and of course also cc). The alternative would be the second analysis in the picture with an exception when an aux is present. Then we'd need specific rules for what to call an orphan. For example, it would probably be normal to at least preserve the subject relation with the auxiliary verb. I think that makes things more complicated and less readable. groetjes