Closed vinbo8 closed 5 years ago
What is dodgy about 3? Many languages have verbs that do double duty as auxiliaries and main verbs, like "have" in English, for example. It is treated as AUX in (1) but VERB in (2-3), where three is a kind of light-verb construction.
(1) She has left. (2) She has a problem. (3) She has a smoke.
Sorry, I meant that it seems a bit dodgy in this context. I think the copula here is similar to an existential copula - this sentence would be something like "on me, information exists". Does it make sense to treat it as a main verb anyway?
I see. Existential verbs are sometimes treated as main verbs, even if they also have copula uses. I am not sure whether it makes sense here.
A side note: is it a good idea to tag māhit as PART
?
I don't know whether māhit as PART
is a good idea, but it never occurs outside this specific construct so I don't really have any other way to look at it. Perhaps ADV
, seeing as it functions a bit adverbially?
Yes it does look similar to the Hindi one. I don't think these are copular. You have to treat these as pysh-predicates to explain dative case.
@vinit-ivar : Would it be too off to tag māhit as a special subtype of ADJ
, perhaps with deficient paradigm? I have found a dictionary that classifies it as adjective: http://www.shabdkosh.com/mr/translate/%E0%A4%AE%E0%A4%BE%E0%A4%B9%E0%A5%80%E0%A4%A4/%E0%A4%AE%E0%A4%BE%E0%A4%B9%E0%A5%80%E0%A4%A4-meaning-in-English-Marathi
It seems more reasonable to me than PART
.
ADJ
definitely sounds a bit off to me.. it can't really be used adjectivally except with the copula, like māhit aslel(ā), where the entire construction functions as an adjective. māhit by itself always occurs with the verb asṇe or the negative verb nāhī. What justification would there be for having it marked ADJ
?
Analogy with other languages where words used in similar constructions are adjectives or participles. (Or nouns, so NOUN
might be another candidate.)
Assigning same labels to categories in different languages is largely about analogies anyway.
I suppose that māhit is a representative of a larger class of words, right? We don't have a dedicated UPOS for this class and even if we decide that the class is important and distinctive enough to add a UPOS tag in UD v3, it is not going to happen soon. So the closest match has to be identified, although it may not be as close as we would wish. The disadvantage of PART
is twofold: first, using it is strongly discouraged in UD v2; and second, if PART
is used, it is supposed to be a closed class of words that are enumerated in the documentation (and the usual candidates are function words rather than content words). It would be quite unusual to have a particle acting as a predicate.
I'm not entirely sure that it is representative of a class - off the top of my head, no other word functions this way. NOUN
definitely sounds better to me than ADJ
, I had it initially glossed as NOUN
, because it seemed like a non-standard form of māhitī
"information".
I changed that when I realised there was no gender agreement there: malā te māhit āhe me-DAT that-NT māhit COP
"I know that" vs. malā tī māhitī āhe me-DAT that-F information.F COP
"I have that information". Of course, the determiner in the first clause refers to something entirely different. Is NOUN
still justifiable? Why not ADV
?
Hmm, I thought that light-verb constructions are wide-spread in Indo-Aryan languages, that's why I supposed there would be a larger class of which māhit would be a representative.
One possibility seems to be to say that māhit is a special form of the noun māhitī, reserved for predicative use. Another possibility is to say that it is an adjective (or adverb, yes) derived from that noun and roughly corresponding to English “known” (although here it would be literally “informed”).
Lack of gender agreement does not necessarily bother me. When I look at other languages, e.g. Czech, there would be an agreement in determinative context (“that knowledge” = ta informace, both ta and informace are feminine) but not in predicative context (“that is a knowledge” = to je informace, to is neuter). Furthermore, in Czech you would have to je mi známo = “that is known to me” and it is somewhat parallel to the Marathi malā te māhit āhe: mi is dative, to is neuter subject pronoun, je is 3rd person copula. However, známo is different from Marathi in that it is clearly an adjective and will agree in gender with the subject: ten muž je mi znám = “that man is known to me”. Maybe German is even more parallel because here bekannt = “known” agrees in gender in attributive contexts (ein bekannter Mann = “a known man”) but takes a genderless form in predicative contexts (der Mann ist mir bekannt = “the man is known to me”).
Light-verb constructions are widespread, but most of the "nouns" in LVCs can exist outside the constructions and are very clearly nouns. Adverb sounds like a better alternative to me, that'd let me neatly partition light verbs into nominal and adverbial ones.
Fair enough about the lack of gender agreement, that probably wasn't a good enough justification.
A light verb construction is typically a noun-verb or a particle-verb pair that functions as a verb; these are fairly widespread in the UD-Persian. UD-Persian marks the verb in a N-V pair as the head of a light verb construction. This is logical, but results in a few problems in UD-Marathi, where a few light verb pairs involve an auxiliary verb as the head. This leads to problems, for instance:
malā māhit āhe
me.DAT know.PART be
"I know". Following the Persian system, an appropriate analysis would be (truncated for brevity):This isn't really very valid, though, it would result in an auxiliary root, which, I gather, is not a good idea. As far as I can see, there are a few ways out of this situation:
compound:lvc
relation with the noun/particle as the head and acompound:lvc
to the copula.cop
to the copula. This is what UD_Hindi does, though I don't know that that's a very good idea - it creates an artificial separation of light verbs into two classes, some copular and some non-copular. Note that UD_Hindi does not usecompound:lvc
, despite also having light verbs.I'm not too pleased with any of these solutions. Any suggestions?