Open MagaliDuran opened 5 months ago
Thanks for bringing this up. I think it's an important discussion to have.
It is also relevant to ask whether in non-pro-drop languages like English, arguments should be propagated to advcl adjuncts. I am using "__" to stand for the inferable subject ("he was" could be inserted there):
There are participial adjuncts that are not strictly required to correspond to the matrix subject though (sometimes these are prescriptively frowned upon as "dangling participles", but with enough context they are understandable):
Note that the matrix subject may be implicit, as in imperatives, in which case there would be nothing to propagate:
I definitely agree that the enhanced UD guidelines are worth revisiting, as they currently feel as an arbitrary selection of items while other similar phenomena are being left behind. Some time ago I actually wrote a long proposal for enhanced enhanced dependencies but we have not had time to put it on the agenda of the core group.
I think there are separate questions here. First, if we do argument propagation in control verb structures (xcomp
) and relative clauses, should we do it also for other constructions, such as advcl
and participles? I think we should, and this would not be specific to pro-drop languages. I am less sure about ccomp
because there it is about coreference which may be clear from the semantic context but does not follow from syntax.
The other question is what to do if the shared argument does not have its node because it is only expressed by verbal morphology. Here I think we can expand the usage of empty/abstract nodes. Yes, UD has the slogan "do not annotate what is not there", especially referring to dropped pronominal subjects, but that slogan holds for the basic representation, not for the enhanced graph (otherwise we could not use abstract nodes at all, while we are currently using them for gapped predicates).
Maybe it is more generally co-reference that can be engineered in some way into enhanced guidelines? With a possibility for "external coreference" for ccomp (maybe simply left unspecified), as opposed to "necessary coreference" pointing to another element (e.g. the finite predicate) in the sentence.
I would be quite opposed to extra empty nodes, even in these cases. I think they might not even be necessary, sometimes even confusing, and I envision two scenarios:
The propagation of ccomp and advcl subjects are not part of the officially approved guidelines. However, in pro-drop languages such as Portuguese (which allow the ellipsis of the subject since the person is marked in the verb form), ccomp and advcl dependents may present an elliptical subject which may be recovered in ccomp and advcl heads. The annotation of these subjects in the enhanced dependencies is of great importance, especially in order not to interrupt chains of propagation of subjects as in the example:
Portuguese (1 explicit subject, 5 enhanced subjects): Ele disse que aposentará em 2025 e que pretende viajar muito enquanto estiver saudável e tiver dinheiro. =>“Ele” is the explicit nsubj of “disse” and the enhanced subject of “aposentará”, “pretende”, “viajar” (xsubj), “saudável” and “tiver”.
English (4 explicit subjects, 2 enhanced subjects): He said that he will retire in 2025 and that he intends to travel a lot as long as he is healthy and has money. => “He” is the explicit nsubj of “said”, “retire”, “intends”, “healthy” and the enhanced subject of “travel” (xsubj) and “has”.
However, if we don't propagate the subject of ccomp, the Portuguese sentence won't have a subject available for the approved enhanced relations (conj dependents and xcomp subject). Furthermore, if we don't propagate the subject of advcl, Portuguese won't have 6 subjects as in the equivalent English sentence. This puts the pro drop languages in an unequal situation in relation to those that don't admit subject ellipsis.
Note: we have already outlined the rules for automatically propagating the subject of ccomp and advcl. Propagation will only occur if:
Could you please consider approving these enhanced relations for the pro drop languages?