UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
273 stars 248 forks source link

Middle Persian: copula dependents disallowed #1054

Closed bulbulistan closed 2 months ago

bulbulistan commented 2 months ago

A combination of main verb as participle etc. together with an analytic form of the auxiliary is analysed as a flat structure in UD, e.g.

Form Word ID Head
participle VERB grift 1 0
participle AUX ēstād 2 1
person marker of AUX i.e. COP hēnd 3 1

grift ēstād hēnd "had been[3pl] taken".

From a Middle Persian perspective, it is clear that the copula is the person marker of the auxiliary. So the better annotation, which we have applied so far, is structured, cf.:

Form Word ID Head
participle VERB grift 1 0
participle AUX ēstād 2 1
person marker of AUX i.e. COP hēnd 3 2

For now, this results in a consistency problem with UD. We would like to know whether we can keep our annotation or not. Thank you!

amir-zeldes commented 2 months ago

Generally speaking in UD, functional dependents like auxiliaries should not have children, so copulas and auxiliaries are attached as sisters to the predicate. This is true in a wide range of UD languages where morphologically, it is clear that that is not the real constituent structure, but the same can be said about the fact that the participle VERB is the root, rather than a dependent of a finite auxiliary. So in a sense, in for a penny, in for a pound! 😅

For comparison, here is how UD_English analyzes perfect progressives ("have been going") - it's clear that there is no such thing as "have going", and it's actually "have been" + "been going", but as part of UD's commitment to lexico-centrism and the promotion of cross-linguistic comparability by promotion of lexical predicates as heads, we get:

...
11  and and CCONJ   CC  _   15  cc  15:cc   Discourse=context-background:24->23:0:ref-dem-212-214,227-228;elaboration-additional:24->23:0:0:orp-and-221
12  I   I   PRON    PRP Case=Nom|Number=Sing|Person=1|PronType=Prs  15  nsubj   15:nsubj    Entity=(3-person-giv:act-cf1*-1-ana)
13  have    have    AUX VBP Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin   15  aux 15:aux  _
14  been    be  AUX VBN Tense=Past|VerbForm=Part    15  aux 15:aux  _
15  working work    VERB    VBG Tense=Pres|VerbForm=Part    9   conj    9:conj:and  MSeg=work-ing
16  on  on  ADP IN  _   18  case    18:case _
17  this    this    DET DT  Number=Sing|PronType=Dem    18  det 18:det  Entity=(5-abstract-giv:act-cf2-2-coref
18  line    line    NOUN    NN  Number=Sing 15  obl 15:obl:on   Entity=5)
19  since   since   ADP IN  _   20  case    20:case _
20  2019    2019    NUM CD  NumForm=Digit|NumType=Card  15  obl 15:obl:since    Entity=(18-time-new-cf3-1-sgl)|SpaceAfter=No|XML=<date when:::"2019"></date>

This may be unsatisfactory just for English, but it make it much easier to have a uniform scheme and comparability of argument structure across a wide range of languages. It sounds like the situation for Middle Persian is the same - it's odd if you look just at that language, but I think it makes it more easy to align with other Indo-Iranian languages, or totally unrelated ones.

bulbulistan commented 2 months ago

Thank you @amir-zeldes! There are good reasons to consider Middle Persian different, at least in some aspects, but we will stick to the general rules.