UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

Light verbs with copulas #486

Closed vinbo8 closed 5 years ago

vinbo8 commented 7 years ago

A light verb construction is typically a noun-verb or a particle-verb pair that functions as a verb; these are fairly widespread in the UD-Persian. UD-Persian marks the verb in a N-V pair as the head of a light verb construction. This is logical, but results in a few problems in UD-Marathi, where a few light verb pairs involve an auxiliary verb as the head. This leads to problems, for instance:

malā māhit āhe me.DAT know.PART be "I know". Following the Persian system, an appropriate analysis would be (truncated for brevity):

1    malā    PRON   3   nsubj
2    māhit   PART   3   compound:lvc
3    āhe     AUX    0   root

This isn't really very valid, though, it would result in an auxiliary root, which, I gather, is not a good idea. As far as I can see, there are a few ways out of this situation:

  1. Reverse the compound:lvc relation with the noun/particle as the head and a compound:lvc to the copula.
  2. Make the noun/particle the head, add a cop to the copula. This is what UD_Hindi does, though I don't know that that's a very good idea - it creates an artificial separation of light verbs into two classes, some copular and some non-copular. Note that UD_Hindi does not use compound:lvc, despite also having light verbs.
  3. Re-gloss the auxiliary verb as a content verb in these contexts (seems dodgy?).
  4. Keep things the way they are.

I'm not too pleased with any of these solutions. Any suggestions?

jnivre commented 7 years ago

What is dodgy about 3? Many languages have verbs that do double duty as auxiliaries and main verbs, like "have" in English, for example. It is treated as AUX in (1) but VERB in (2-3), where three is a kind of light-verb construction.

(1) She has left. (2) She has a problem. (3) She has a smoke.

vinbo8 commented 7 years ago

Sorry, I meant that it seems a bit dodgy in this context. I think the copula here is similar to an existential copula - this sentence would be something like "on me, information exists". Does it make sense to treat it as a main verb anyway?

jnivre commented 7 years ago

I see. Existential verbs are sometimes treated as main verbs, even if they also have copula uses. I am not sure whether it makes sense here.

dan-zeman commented 7 years ago

A side note: is it a good idea to tag māhit as PART?

vinbo8 commented 7 years ago

I don't know whether māhit as PART is a good idea, but it never occurs outside this specific construct so I don't really have any other way to look at it. Perhaps ADV, seeing as it functions a bit adverbially?

riyazbhat commented 7 years ago

Yes it does look similar to the Hindi one. I don't think these are copular. You have to treat these as pysh-predicates to explain dative case.

dan-zeman commented 7 years ago

@vinit-ivar : Would it be too off to tag māhit as a special subtype of ADJ, perhaps with deficient paradigm? I have found a dictionary that classifies it as adjective: http://www.shabdkosh.com/mr/translate/%E0%A4%AE%E0%A4%BE%E0%A4%B9%E0%A5%80%E0%A4%A4/%E0%A4%AE%E0%A4%BE%E0%A4%B9%E0%A5%80%E0%A4%A4-meaning-in-English-Marathi

It seems more reasonable to me than PART.

vinbo8 commented 7 years ago

ADJ definitely sounds a bit off to me.. it can't really be used adjectivally except with the copula, like māhit aslel(ā), where the entire construction functions as an adjective. māhit by itself always occurs with the verb asṇe or the negative verb nāhī. What justification would there be for having it marked ADJ?

dan-zeman commented 7 years ago

Analogy with other languages where words used in similar constructions are adjectives or participles. (Or nouns, so NOUN might be another candidate.)

Assigning same labels to categories in different languages is largely about analogies anyway.

I suppose that māhit is a representative of a larger class of words, right? We don't have a dedicated UPOS for this class and even if we decide that the class is important and distinctive enough to add a UPOS tag in UD v3, it is not going to happen soon. So the closest match has to be identified, although it may not be as close as we would wish. The disadvantage of PART is twofold: first, using it is strongly discouraged in UD v2; and second, if PART is used, it is supposed to be a closed class of words that are enumerated in the documentation (and the usual candidates are function words rather than content words). It would be quite unusual to have a particle acting as a predicate.

vinbo8 commented 7 years ago

I'm not entirely sure that it is representative of a class - off the top of my head, no other word functions this way. NOUN definitely sounds better to me than ADJ, I had it initially glossed as NOUN, because it seemed like a non-standard form of māhitī "information".

I changed that when I realised there was no gender agreement there: malā te māhit āhe me-DAT that-NT māhit COP "I know that" vs. malā tī māhitī āhe me-DAT that-F information.F COP "I have that information". Of course, the determiner in the first clause refers to something entirely different. Is NOUN still justifiable? Why not ADV?

dan-zeman commented 7 years ago

Hmm, I thought that light-verb constructions are wide-spread in Indo-Aryan languages, that's why I supposed there would be a larger class of which māhit would be a representative.

One possibility seems to be to say that māhit is a special form of the noun māhitī, reserved for predicative use. Another possibility is to say that it is an adjective (or adverb, yes) derived from that noun and roughly corresponding to English “known” (although here it would be literally “informed”).

Lack of gender agreement does not necessarily bother me. When I look at other languages, e.g. Czech, there would be an agreement in determinative context (“that knowledge” = ta informace, both ta and informace are feminine) but not in predicative context (“that is a knowledge” = to je informace, to is neuter). Furthermore, in Czech you would have to je mi známo = “that is known to me” and it is somewhat parallel to the Marathi malā te māhit āhe: mi is dative, to is neuter subject pronoun, je is 3rd person copula. However, známo is different from Marathi in that it is clearly an adjective and will agree in gender with the subject: ten muž je mi znám = “that man is known to me”. Maybe German is even more parallel because here bekannt = “known” agrees in gender in attributive contexts (ein bekannter Mann = “a known man”) but takes a genderless form in predicative contexts (der Mann ist mir bekannt = “the man is known to me”).

vinbo8 commented 7 years ago

Light-verb constructions are widespread, but most of the "nouns" in LVCs can exist outside the constructions and are very clearly nouns. Adverb sounds like a better alternative to me, that'd let me neatly partition light verbs into nominal and adverbial ones.

Fair enough about the lack of gender agreement, that probably wasn't a good enough justification.