UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
270 stars 245 forks source link

Head in Hindi/Urdu light verb constructions #401

Closed ftyers closed 4 years ago

ftyers commented 7 years ago

We are currently having a discussion about conversion of the Hindi/Urdu treebanks to version 2.0. One of the questions we have is which token in a light verb construction such as (1)

(1) "Maine    Mohan ka intizar kiya."
     I-ERG    Mohan 's  wait     did."

VERB head

1     Maine     PRON              5        nsubj
2     Mohan    PROPN            5       obj
3     ka           ADP                 2        case 
4     intizar      NOUN             5        compound:lvc
5     kiya         VERB              0       root

NOUN head

1     Maine     PRON              4        nsubj
2     Mohan    PROPN            4       obj
3     ka           ADP                 2       case 
4     intizar      NOUN             0        root
5     kiya         VERB              4       aux / compound:lvc

The light verb "intizar kiya" takes Mohan as it's second core argument in genitive (because of the nominal item "intizar"), while the agreement is all on the verb "kiya".

Going on what Joakim said in Osaka, it shouldn't really matter which one we pick because the combination should be treated as a single unit. However it would be good to have consistent guidelines with respect to other languages exhibiting this phenomenon (e.g. Persian, Kurdish, Turkic).

386 and #255 are relevant here.

vinbo8 commented 7 years ago

If both these schemes are roughly equally valid, I'd suggest marking the noun as the head, because the verb often has very poor semantic value in light verb constructions in Hindi/Urdu - you can, for instance, have the expression "jhāḍū mārnā", literally "broomstick+NOUN hit+VERB", to mean "sweep" (eg. the floor).

I'm not sure how this is in Kurdish or Turkic, though @MemduhG claims that similar constructions with semantically iffy verbs exist in Persian.

jnivre commented 7 years ago

I think treating the verb as the head will be better for cross-linguistic parallelism.

dan-zeman commented 7 years ago

I prefer the verb as the head. But I would still make "Mohan ka" a dependent of "intizar".

ftyers commented 7 years ago

@dan-zeman Hmm, what relation would you give it ? obj or nmod:poss ?

riyazbhat commented 7 years ago

I guess using nominal host as the head would be more appropriate and would not even violate parallelism. Light verbs function more like auxiliaries. They govern case marking, agreement and TAM. While host nominal is considered to be the true predicate in such constructions.

The evidence for host nominal as the head comes from code-mixed conversational data. In Hindi-English code-mixing, Hindi and Urdu speakers usually create new predicates by using English verbs as host nominals. English verbs are not directly used, rather they form complex predicates with appropriate light verbs.

"Maine    Mohan ka wait kiya."
 I-ERG    Mohan 's  wait     did."

In this example wait behaves like a nominal and heads the genitive construction Mohan ka wait.

Treating host nominal as the head would also solve the problem of object being marked by a genitive. However, in the original treatment, object gets masked as nmod.

jnivre commented 7 years ago

So what exactly is the analysis you are proposing? It is important that we maintain consistency across languages. Right now, it looks like we will end up with three different analyses of light-verb constructions, "obj" English (and many other languages), "compound:lvc" with verb as head in Persian, and "compound:lvc" with noun as head in Hindi. Are these distinctions really motivated or are we going back to "annotating the same thing in different ways" across languages?

riyazbhat commented 7 years ago

If I am not wrong, Persian complex predicates show a similar behavior. In both languages, we can use nominal host as head. English based analysis for complex predicates i.e. using obj is not at all valid for Hindi and Urdu and for most of the Indian languages. Objecthood is rather used as a diagnosis for differentiating between nouns that can and can't form complex predicates.

This analysis would help cross-lingual parsing as well. A lexicalized parser trained (cross-lingual word embeddings) on English data would almost always choose the nominal host over light verb as head in Hindi-Urdu.

jnivre commented 7 years ago

I am still not convinced that the noun-as-head analysis is superior. If we accept that N+V is a complex predicate, it is nevertheless a verbal predicate. As far as I know, there are know examples of languages that drop the light verb in the same way that some languages drop the copula in nominal clauses. Hence, the predicate is not the noun, but the combination of verb and noun. And since the whole expression behaves like a verb, it is more natural from a syntactic point of view to keep the verb as the head. Remember: UD is syntactic annotation, not semantic role labeling. In addition, there is the practical consideration that keeping the verb as head saves us the work of reannotating the Persian treebank.

dan-zeman commented 7 years ago

@ftyers : Definitely not obj. Intizar is not a transitive verb. And I also do not think that the ka postposition is the way of encoding the grammatical function P in Hindi.

nmod would seem appropriate to me. Whether or not with the :poss extension depends on how this extension is defined in the Hindi documentation :-) Syntactically it is same as possessives, although semantically it is nowhere near expressions like Mohan ka ghar.

danielrruf commented 7 years ago

Same feeling here

2017-01-18 23:24 GMT+01:00 Joakim Nivre notifications@github.com:

I think treating the verb as the head will be better for cross-linguistic parallelism.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/docs/issues/401#issuecomment-273621150, or mute the thread https://github.com/notifications/unsubscribe-auth/AESQdw2bw6JHQZO1nhWWHJwI8X4iHahSks5rTpEBgaJpZM4Lna0c .

mojgan-seraji commented 7 years ago

As long as it concerns Persian and maybe similar languages I definitely prefer verb-as-head, depending on a number of reasons:

  1. There are many main verbs in Persian that function as light verbs in a very abstract semantic interpretation (Seraji, 2015). For instance, the main verb "eat" can function as a light verb in "to-eat ground" (to fall down), or in "to-eat eye" (losing fortune or to be put under a spell based on negative energy generated by envy/evil eye; an expression that refers to traditional belief in Iran and maybe many other parts of the world), and så forth.

It would seem really odd to keep the verb "eat" as dependent to "ground" or "eye", even though the verb has weak semantic contents of its own. The light verb "eat" in these examples has a different mening and kan be interpreted as "meet/face/get": "to-face ground" and "to-get negative energy by envy/evil eye". In other words, the verb "eat" still functions as the main part of the compound, as noted above, in an abstract interpretation.

  1. The light verbs "to-eat ground" and "to-eat eye" are intransitiv constructions and as soon as the concepts turn into transitive constructions the light verb "hit" is used instead of "eat". e. g., "to-hit ground" (to hit something/someone to the ground) and "to-hit eye" (to give somebody the evil eye (Seraji, 2015).

  2. The light verb inflects for person and number.

  3. @jnivre: I think treating the verb as the head will be better for cross-linguistic parallelism.

I cannot agree more!

riyazbhat commented 7 years ago

I was just looking at some CV patterns in code-mixed data that we have been annotating in UD for some time at IIITH. It seems, in case of ellipses light verbs are dropped instead of nominal host. So, if we treat light verb as the head, the arguments should be orphan, however, thats not the case if host is treated as head.

mujhe nahi pata. I-DAT not know I don't know.

dan-zeman commented 7 years ago

If my assumption is correct that the full version is mujhe nahi pata hai, then I think the verb hai should be treated as copula and pata should be the head anyway.

But it does not solve the more prototypical light verbs of course.

riyazbhat commented 7 years ago

I don't think so. You can't explain dative case on First person pronoun, if you treat it as a copular construction. pata hai would be a pysch-predicate here assigning dative case to its internal argument.

vinbo8 commented 7 years ago

hai would never be the head of the clause anyway, it would be a cop.

vinbo8 commented 7 years ago

Oh no wait, sorry, you're right - these kinds of verbs are ones I've been having issues with in Marathi (#486) too.

riyazbhat commented 7 years ago

We will be treating host nominals as head in elided constructions while light verbs in normal ones till we reach a crosslingual consensus.

dan-zeman commented 7 years ago

Isn't the pronoun an argument of pata and isn't the dative required by pata?

riyazbhat commented 7 years ago

Since pata is the true predicate like other host nominals in complex predicates, it controls the argument structure and assigns case. So yes, the pronoun is an argument of pata and will receive case from it.

On Sep 9, 2017 12:54 AM, "Dan Zeman" notifications@github.com wrote:

Isn't the pronoun an argument of pata and isn't the dative required by pata?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/docs/issues/401#issuecomment-328259199, or mute the thread https://github.com/notifications/unsubscribe-auth/AFRbGiripCkpI2Rhl5hnXT7pS7em5szxks5sgjYWgaJpZM4Lna0c .