UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
274 stars 249 forks source link

xcomp for ergative languages #736

Open AngelAquino opened 4 years ago

AngelAquino commented 4 years ago

From the xcomp guidelines:

An open clausal complement (xcomp) of a verb or an adjective is a predicative or clausal complement without its own subject. The reference of the subject is necessarily determined by an argument external to the xcomp (normally by the object of the next higher clause, if there is one, or else by the subject of the next higher clause). This is often referred to as obligatory control. These clauses tend to be non-finite in many languages, but they can be finite as well. The name xcomp is borrowed from Lexical-Functional Grammar.

The use of the term "subject" here could be an issue for ergative languages.

Consider the following Tagalog construction:

text = Makatutulong sa mga mag-aaral ito upang mabatid ang kahalagahan ng ASEAN.
text_en = This will help students to realize the importance of ASEAN.
1   Makatutulong    tulong  VERB    _   _   0   root    _   _
2   sa  sa  ADP _   _   4   case    _   _
3   mga mga DET _   _   4   det _   _
4   mag-aaral   mag-aaral   NOUN    _   _   1   obl _   _
5   ito ito PRON    _   _   1   nsubj   _   _
6   upang   upang   SCONJ   _   _   7   mark    _   _
7   mabatid batid   VERB    _   _   1   xcomp   _   _
8   ang ang ADP _   _   9   case    _   _
9   kahalagahan kahalagahan NOUN    _   _   7   nsubj   _   _
10  ng  ng  ADP _   _   11  case    _   _
11  ASEAN   ASEAN   PROPN   _   _   9   nmod    _   SpaceAfter=No
12  .   .   PUNCT   _   _   1   punct   _   _

Here, the xcomp mabatid (realize) has a grammatical subject: the ang-marked noun phrase kahalagahan ng ASEAN (importance of ASEAN). However, the verb mabatid has an obligatory actor argument, controlled by the noun phrase mga mag-aaral (students) from the higher clause, which is not the grammatical subject.

A more extensive discussion of this appears in Manning (1994), where the controller/controllee of such clauses is generally observed to be sensitive to the actor NP, whether or not that NP maps to the grammatical subject in ergative languages. Below are examples for Tagalog and Central Arctic Inuit from the text: image image

I would like to ask how this can be taken into account in the guidelines going forward. (Thankfully, the current validation script does not appear to flag the existence of an nsubj dependent on an xcomp as an error.)

dan-zeman commented 4 years ago

I suspect that the guidelines for xcomp have been written with mostly Indo-European languages in mind. It seems useful to be able to cover this kind of control by xcomp too, so maybe we could extend the guidelines? (UD is normally very conservative about changing the guidelines; but specification of new phenomena that are discovered when new languages are added is not forbidden and has happened before.)

The new type of control would have to be mentioned also in the guidelines for enhanced dependencies. And it would be very useful to add enhanced dependencies in Tagalog that would explicitly show the coreference between mag-aaral and the missing object of mabatid!