Infinitives vs. participles

vinbo8 commented 7 years ago

Whilst the docs do provide language specific examples of each, it isn't very clear to me how one might draw a line between the two cross-linguistically. Both appear to be usable in periphrastic verb forms, both are non-finite. It seems like the most "rigid" way to define an infinitive is its appearance as the citation form, which doesn't hold for Marathi - the citation form is a gerund, although (certain) traditional grammars (Masica, Navalkar) gloss one particular non-finite form (i) which combines with the modal verb ṣakṇe (equivalent to "can", as in ability) as the infinitive. Other grammars (Wali, Dhongade) disagree and gloss the gerund as the infinitive, and form (i) as a participle. Is there a decent cross-linguistically informed way to determine which is which?

Edit: in case I was being vague, I'm not asking about gerunds vs. infinitives - that bit is clear, Marathi gerunds are very obviously gerunds.

jonorthwash commented 7 years ago

@vinit-ivar It would help to have an example of what you're asking about. And are you asking for morphological labels or dependency labels?

jonorthwash commented 7 years ago

On a related note, "infinitives" in Turkic languages are almost always verbal nouns of one sort or another—the term basically just refers to whichever verbal noun is used as the preferred citation form. In some languages, however (e.g., Tatar), there are other uses: I think the "infinitive" in Tatar is used in auxiliary constructions and as a verbal adjective, making it a traditional "converb" (though someone else might want to clarify).

jonorthwash commented 7 years ago

As I see it, there are four primary types of non-finite verb forms / four primary ways that non-finite verbs can be used:

verbal noun (taking csubj, ccomp, etc. as appropriate)
verbal adjective (taking acl)
verbal adverb (taking advcl)
head of an auxiliary construction (taking whatever the main role is, with an aux dependent)

From what I'm seeing of your examples on IRC, this "infinitive in -u" is primarily used as the last of those.

(In the Turkology literature, the first two are often called "participles" and the last two are often called "converbs"—and the morphology groups together that way too in many Turkic languages, though not always so nicely as some sources may lead one to believe.)

vinbo8 commented 7 years ago

Ah fair enough, my bad.

mī baghū ṣakto I see-??? can-PERF "I can see"

is an example of the "infinitive". It'd have an aux relation with 'see' as the head - but would the morphology have VerbForm=Inf or VerbForm=Part?

jonorthwash commented 7 years ago

How is each of those terms defined, and how does each line up with the four types of non-finite verbs I listed?

sylvainkahane commented 7 years ago

Jespersen, Tesnière or Melcuk consider 4 basic forms for the verb:

finite form when it is the head of a main sentence
infinitive when it commutes with nouns
participle when it commutes with adjectives
gerund when it commutes with adverbs

When the verb co-occurs with an auxiliary, it's generally not a finite form. For instance in English it is a participle with the copula (is sleeping) and an infinitive with modals (can sleep).

@vinit-ivar In your example I guess that baghū is an infinitive form but to confirm that we must to look whether it can commute with nouns or adjectives in other syntactic positions.

dan-zeman commented 7 years ago

As with POS, the names of the categories are used cross-linguistically but their definition is language-specific, and their properties overlap only partially across languages. AFAIK, in Indo-Aryan languages infinitives and verbal nouns are more or less the same category while e.g. in Slavic languages they are clearly distinct. As Sylvain says, participles are verbal adjectives; in languages like Marathi this IMHO means that they can inflect for gender (while the gender of infinitives, if needed for agreement, would be predetermined – neuter I guess?)

Our first goal should be that similar forms are called similarly in closely related languages, i.e. Marathi should be harmonized with Hindi and Urdu (and vice versa; I am not necessarily saying that Hindi is correct and Marathi must obey). In the literature I've met, the Marathi forms ending in -णे (-ṇe) and Hindi ending in -ना (-nā) are called infinitives.

vinbo8 commented 7 years ago

@jonorthwash - the third amongst those seems like VerbForm=Conv to me, which is how they are labelled in UD_Russian, AFAIK. On the other hand, Russian grammars refer to the same verb form as a gerund. The docs do say that Ger ought to be avoided, so unless I'm mistaken, the fourth class of verbs that @sylvainkahane mentioned are converbs in UD. The fourth is a bit murky - VerbForm=Inf is, according to the docs, what ought to have an aux dependant, though UD_English, UD_German and UD_Czech all appear to have VerbForm=Part with aux dependants.

@dan-zeman - I had initially glossed the -णे (-ṇe) form as the infinitive, though Masica and Navalkar both disagree (without any justification that I could find mentioned), and @sylvainkahane's definition of an infinitive appears to fit well here. I can't really refer to UD_Hindi here because the treebank has an empty VerbForm for the form used in similar constructions (the verb stem in Hindi).

dan-zeman commented 7 years ago

Yes, the fourth form of @sylvainkahane should be labeled VerbForm=Conv.

VerbForm=Inf is not required to have an aux dependant. In Czech (and AFAIK all Slavic UD treebanks) modal verbs are not considered auxiliary (unlike in English), but they do subcategorize for an infinitive, i.e. the infinitive is their child node (xcomp) but it does not take any auxiliary. There are other verbs that are not even modal but they subcategorize for an infinitive complement. On the other hand, the infinitive is also used to construct the future tense (in a subset of Slavic languages) and here it needs an auxiliary child. Moreover, participles indeed can and frequently do occur with auxiliaries.

As for Hindi: I think all verbs should have a non-empty VerbForm, so perhaps we should look into this. Many of the unlabeled verbs that I see in the data are what I know as perfective participles; but the bare stems, that are needed in some periphrastic constructions, are there as well. I would label the -ना forms either VerbForm=Inf or VerbForm=Vnoun (with a preference towards Inf). I am not sure about the bare stems. We might use a language-specific VerbForm=Stem, or use Inf too, and distinguish it by a language-specific feature InfForm=Stem. Something similar is used in Finnish where they have multiple infinitive types in the grammar.

This query shows that there is a certain level of chaos in the verb form labels in Hindi: http://hdl.handle.net/11346/PMLTQ-NXRW. However, the -ना forms are (often, if not always) labeled VerbForm=Inf: http://hdl.handle.net/11346/PMLTQ-JA0K.

vinbo8 commented 7 years ago

I'd also prefer labeling the -ना/णे forms VerbForm=Inf. For reference, this is all Masica has to say about it:

The Marathi Verbal Noun in -ṇe-, ObI. -ṇyā- (technically not an infinitive, in M. a separate form in -u-, despite its similarity to the Hindi form) has no such identity problems: its functions are purely nominal.

For now, I'll gloss both of them as infinitives. I can't, off the top of my head, think of anything that differentiates the Marathi -u (in this construct) from the Hindi stem, yet no grammar I've seen calls the Hindi stem an infinitive. Of course, it's entirely possible that the justification for calling -u an infinitive is from another construction where Hindi does use the infinitive:

M. mī bas-ū lāglo I sit-INF attach-1MSG vs. H. mai baiṭh-ne lagā I sit-INF.OBL attach-1MSG "I began to sit"

"attach" is intransitive in both these contexts.

I'm not particularly happy with the morphological annotation in UD_Hindi - and yeah, the missing VerbForms appear to be the perfective participle and the stem. While I haven't done a massive survey, a Hindi grammar that I have (Koul) doesn't appear to refer to the stem as anything but the stem (not even an infinitive), which seems to me like less of a functional label and more of a descriptive one.

UniversalDependencies / docs

Infinitives vs. participles #494