Closed kimgerdes closed 6 years ago
We generally promote auxiliaries when main verbs are elided, so an alternative would be to let "hat" be the root of the first clause "angerufen" the root of the second. This would avoid the use of the orphan relation completely.
One could also say that "hat angerufen" belongs to the first conjunct and the position of "angerufen" is the result of the German "bracketing" word order, not because it has more to do with the second conjunct. I think it would be my preferred perception of the sentence. But it would have strange consequences for our conj
guidelines: now the head of the first conjunct would appear to the right of the second conjunct, and conj
would be no longer right-branching. Since this seems to be too wild (and also impractical for automatic testing of guidelines conformity), Joakim's proposal is probably better.
Joakim's proposal doesn't work for subordinates (or similar cases in any verb final language):
Here's Dan's reversed conj proposal (if I understood correctly):
The good: projective, natural phrases (?)
The ugly: reversed conj
relation
So which one should we prefer? Or maybe we should first look at examples from other verb-final languages?
My proposal (yes, it would look like Kim's second tree in the previous post) was primarily motivated by the unwillingness to separate hat from angerufen; that would not be a problem in subordinates. German is not a verb-final language in general. When preparing v2, we had a lengthy discussion about allowing each language to select conj
direction (but then the language would have to use it everywhere). At the end of the day, the proposal was rejected.
In the discussion, we also considered examples of first-conjunct gapping from verb-final languages, in particular Uyghur. Since the guidelines do not allow right-to-left conj
, such languages have to use orphan
in the first conjunct, and the verb in the second conjunct will be attached via conj
to the promoted argument of the first conjunct.
So we follow the principles that Dan recalled:
conj
has to keep going from left to right We conclude that this for subordinates this is the correct UD analysis:
This means, however, that for verb-final languages, the root of the coordination does not point to the main verb of the construction.
Remaining problems:
You can modify the subordinate German sentence here and the Korean sentence here
I agree that these are the analyses that are most consistent with out current principles. I also agree that it is unsatisfactory not to have a verb as the root despite the fact that there is a verb available. Maybe we need to consider exceptions to the rule that conj relations are always attached to the left. But these would open a big can of worms, so it must be considered carefully ... :)
Ok, so let's just conclude by clarifying the remaining points:
orphan
or do we keep their "natural" names?thanks!
1-2. The marker and the adverb should attach to the root of the clause, hence ”er” in this case.
For the marker, it is definitely fine to use the ordinary label (mark). The adverb is borderline, but I would be willing to accept advmod, I think.
I would say yes, but I may not have thought through the consequences.
Definitely no. Since we decided to use obliqueness and not linear order to determine the root of the gapped clause, we have to accept any directions.
Ok. Perfect. I try to resume:
conj
always goes from left to right, even in case of left conjunct gapping. This means that in verb final constructions, the head of the sentence is no longer the verb but the promoted head of the gapped phrase. orphan
to the root of the gapped clause.The German example then gives:
If both clauses had the same modifier, they would become orphan in the gapped clause:
If we agree I could integrate the additional specifications somewhere in the guide, possibly to the orphan
page or the ellipsis section?
In general I agree. I am only not so sure that the distinction between 3. and 4. is defined strictly this way (although this seems to be a reasonable approximation). In fact, I don't think there is such a detailed definition at all. We have agreed that function dependents such as mark
or cc
should not become orphan
just because they are attached to the promoted argument/adjunct. We did not say exactly what else (if anything) should be exempt from “orphanization”. Personally I am slightly in favor of labeling gleich as an orphan
as well.
I agree with Dan. The relevant criterion is not whether the function appears in the non-gapped clause (although there is often a correlation). Functional relations like cc and mark remain as they are. Core relations definitely are replaced by orphan. Relations like obl, advmod and advcl are borderline. I think the guidelines say that orphan should be used only for core arguments, but I think this was too restrictive so maybe we should say core arguments and modifiers.
At present, http://universaldependencies.org/u/dep/orphan.html says that core arguments are a typical example (that is, not the only possibility). http://universaldependencies.org/u/overview/specific-syntax.html#ellipsis sounds as if orphan were used only with core arguments. However, we have shown with @Kira-D in the UDW paper that the orphan relation is needed (and indeed used in some treebanks) also for oblique arguments and adjuncts, as in
She flew to Berlin yesterday and to London today.
In some languages orphan
is also used in comparatives. Therefore, I think we should modify the wording of the guidelines to sound less restrictive.
I also think that only markers and conjunctions should be attached with the respective relation and everything else, including modifiers, should be attached using orphan
(that's also what I've implemented in the English treebank). The orphan
relation is as Joakim said not for indicating whether something should be shared across conjuncts or not but instead, for signaling that the orphan
dependent is actually not a dependent of its governor (and that it actually depends on something that was elided).
Regarding the sentence with the V2-auxiliary: I was also struggling with these at some point and didn't come up with a satisfying solution. I agree with Dan that the analysis that does not separate the aux and the main verb is the most sensible one despite the flipped conj
relation. So at the risk of opening a can of worms, I'd actually be inclined to allow the conj
relation from right to left in this case.
The main argument in the v2 discussion was that we wanted the order of conjuncts to be the same across languages and therefore, we decided that conj
should always go from left to right. But this principle wouldn't be violated if we allowed a conj
relation from the main verb to the subject in the second conjunct, so I think we could allow this exception without restarting the debate of whether verb-final languages should have the opposite conj
order.
Note that if we added this exception, this wouldn't change anything for languages like Korean or Japanese, which I think is fine because the conjunct-internal structure is exactly the same as in languages in which the first conjunct contains the verb.
We have essentially two choices here:
conj
link. Then we can decide to always make conj
go from left to right, and orphans are always in the incomplete constituent.In Sebastian's approach, I don't understand, why we wouldn't then do the same thing for Korean.
Sylvain and I prefer the constituent-centered solution, in particular because the actual unordered dependency tree is not that different for different word orders. If you see the conj
link as a horizontal paradigmatic link, we have these two similar structures:
V X Y & X' Y' (he gave me tea and her coffee)
V -conj- X'
/ \ |
X Y Y'
X Y & X' Y' V (dass er mir Tee und ihr Kaffee gegeben hat)
X -conj- V
| / \
Y X' Y'
The first tree we will anchor at the V node, the second tree, we will anchor at the X node, but apart from that, it's the same structure.
The rule in the case would be that in case of head-gapping, all syntactic dependents would be connected by orphan
to the newly promoted head. Only functional elements (that are in fact functional heads) will keep the usual relation: aux
,cop
, case
, mark
(and of course also cc
)
I also support the constituent-cantered approach here. Sebastian's analysis looks superficially appealing in that it has many dependency relations between words that look like plausible head-dependent relations, but the price to pay is that it completely distorts the structure of the coordination, essentially embedding one conjunct inside the other. This price is too high, in my opinion.
I gave this some more thought and I actually agree with you. Like Dan, I was initially really uncomfortable with separating the auxiliary and the main verb -- in part because one cannot separate them across sentences and a discourse such as the following is not possible:
A: Er hat gleich seine Frau. (He has immediately his wife) B: Und sie ihre Mutter angerufen. (And she her mother called)
But at the same time, I think one could insert an auxiliary in the second conjunct (it sounds a bit weird but I'm pretty sure people do this):
Er hat gleich seine Frau und sie hat ihre Mutter angerufen (He has immediately his wife and she has her mother called)
I think this can be seen as an argument for promoting the auxiliary in the first conjunct, and in combination with not breaking all of our other conventions regarding the directionality of the conjunct relation for coordinated clauses, it seems smart to stick with the constituent-centered approach.
Cool. Thanks to all of you for the debate. It's much clearer now for me.
Our Dutch treebank (the underlying data for the Dutch UD treebanks) certainly has cases @sebschu thinks are marginally acceptable (ie with gapped main verb in 1st conjuct):
Van dit totaal zou de VS 95 miljard en de EU 106 miljard dollar voor haar rekening nemen .
(of this total should the US 95 billion and the EU 106 billion dollar "pay")
volgens Ante Pavelić moest " 1/3 geassimileerd , 1/3 verdreven , en 1/3 vernietigd worden " (according to pavelic should 1/3 assimilated 1/3 expelled and 1/3 destroyed be)
I am currently not converting these cases to UD in the right way, hope to make some progress in the near future
I notice now that I missed the bit about repeating the aux in both conjucts while omitting the main verb in the first conjuct. These seem rare, but I found some in the corpus of spoken dutch:
ge moogt xxx niet tegen de wand en je moogt ook niet tegen uw eigen aankomen
you may not against the wall and you also may not with yourself contact
en dan zijn we een stukske langs 't kanaal ... en dan zijn we hier naar Leut gefietst
and then we have a bit along the canal ... and then we have here to Leut biked
It is very interesting that you bring up spoken language because the reason we got interested in this was actually not coordination but rather elaboration and similar structure which are very common in spoken data.
I guess from this discussion it evolved to treat these cases as usual head-gapping in the first conjunct. This means like the first analysis here:
The rule that I tried to follow (correct me if I'm wrong): promotion of highest argument to the new head, orphan for all syntactic arguments, preservation of all links to functional dependents (that are actually heads in a traditional dependency analysis): aux
,cop
, case
, mark
(and of course also cc
).
The alternative would be the second analysis in the picture with an exception when an aux is present. Then we'd need specific rules for what to call an orphan. For example, it would probably be normal to at least preserve the subject relation with the auxiliary verb. I think that makes things more complicated and less readable.
groetjes
I wonder how to preserve the obligatory right branching of the
conj
relation in case of an ellipsis in the first conjunct.I see two possible analyses, which one should be chosen?
The first analysis follows the idea of ellipsis: The verb isn't present and the first argument (er) becomes the new head (with
aux
andadvmod
relations). The good: projective, natural phrases (?) The ugly: verb-phrase headed by a pronounThe second analysis forces the second conjunct to be analyzed as orphaned although it is right next to the head.
The good: first conjunct complete, second has to be reconstructed The ugly: non-projective, unnatural phrases (?)
If you have a better idea, you can modify my proposed analyses here