UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 245 forks source link

Transitive vs intransitive verb features? #1042

Open AngledLuffa opened 1 month ago

AngledLuffa commented 1 month ago

Are there morphological features for verbs that have an intransitive vs transitive morphology?

For example, this comes up a lot in Hungarian:

https://en.wikipedia.org/wiki/Hungarian_verbs

It may also be a useful feature in the Sindhi dataset we've been working on with Isra (@muteeurahman), which is why I bring this up

Stormur commented 1 month ago

I think this is mostly seen as derivational, and thus not considered.

I remember somewhere there is/was a feature for Transitivity, but it makes little sense, as this is a syntactic notion.

I can see something like this used only if there really is a very regular and predictable morphological marking of transitivity. But then, transitivity is still a syntactic issue, and you detect it from the presence or absence of objects or object markers, so I do not know how much sense it would make in the end. Do you have some examples?

dan-zeman commented 1 week ago

There is Subcat, a language-specific feature, used in a bunch of languages. I don't know how it works in these languages. If it simply marks verbs that take zero, one or two objects, then I agree that it is not very useful. But I have heard that verbs in some languages can be both transitive and intransitive, and change the morphology accordingly. (The language I heard it about was Chukchi, but it does not seem to use the feature in UD.)

rueter commented 1 week ago

Are there morphological features for verbs that have an intransitive vs transitive morphology?

For example, this comes up a lot in Hungarian:

https://en.wikipedia.org/wiki/Hungarian_verbs

Hi, @AngledLuffa and @Stormur ! In the Hungarian UD, it appears that the Feature «Definite» is used for marking not only the definite article but also verbs.

In verbs, the feature has three values:
2 (the marking on the verb «lak/lek» indicates a second person object, which may be present or absent in the sentence as a separate pronoun.) Def (the marking on the verb indicates a definite object, which may be present or absent in the sentence as a separate noun.) Ind (the marking of the verb is equivalent to that found on an intransitive verb or a verb, whose object is NOT definite.)

The Erzya, Moksha and Apurinã treebanks also indicate object marking of objects in the features. All three languages have object marking for three persons and two numbers. Hence the features include:

Number[obj]=Sing
Number[obj]=Plur
Number[obj]=Plur,Sing
Person[obj]=1
Person[obj]=2
Person[obj]=3

The presence of an independent noun or pronoun is not required, and therefore it is the verbal morphology that provides information indicating transitivity to some extent. In Erzya and Moksha, the object marking tends to be limited to definite object present of absent, whereas the lack of object marking in verbs indicates that the verb may be either intransitive or transitive with an indefinite object or objects. In this way, the «obj» dependency and not transitivity features that indicates transitivity.

Abkhaz seems to use a Trans feature with values Yes' andNo'. Perhaps, @paulmeurer can tell us about it.

Stormur commented 1 week ago

Hi, @rueter !

Thanks for the overview! I am rather familiar only with the Hungarian "definite" conjugation. I have some on-the-fly considerations about them related to the present issue.

This annotation seems rather problematic to me: yet another case where traditional denomination interferes with UD labels. There is no way a value Def can allow typological comparison, as what happens here is a 3rd person object marking on the verb, so one would need Person[obj]=3. I think that the "definite" denomination just comes from the fact that, within this system, this marking makes it obvious which person (and maybe number) a non-expressed object is, and it also takes place only with somehow defintie objects, but this is a property of the object, not of the verb. Also, why is the -lek/-lak not tagged with the involved 1/2. persons/numbers for nsubj/obj? Further, since there is actually no marking for non-defintie objects, the Ind value looks like one of those "negative values" which should not be annotated. It is in fact extremely redundant, being the default case of all other forms.

In sum, the current annotation is totally idiosyncratic to Hungarian.

As you notice, in the end, it all boils down to observing either explicite object arguments, or markers thereof on the predicate. Transitivity or not of a clause then follows, but no "transitivity" marker is involved. Sticking with Hungarian, a place where one could actually look for "transitivity" markers are the affixes in couples like tanítani 'to teach' / tanulni 'to study'.

paulmeurer commented 1 week ago

In the Abkhaz treebank, the feature Trans (Yes/No) is used to mark the transitivity of the underlying base verb. That is, an active verb not in the potential form is marked Trans=Yes if it has a direct object marker (which is not always overt) and ergative morphology, and the dependency relation to the object is accordingly «obj». Trans=Yes is also kept in the potential form, where the direct object is morphologically marked as (intransitive) subject (with relation «nsubj»), and the subject is demoted to a type of indirect object, with a special marker in the verb (relation «obj:po»). E.g.,

bəzdərʒom Dyn=Yes|Gender[po]=2|Number[po]=Sing|Person[po]=2|Person[subj]=3|Polarity=Neg|Reln=Pot|Tense=Pres|Trans=Yes|VerbForm=Fin
“you.fem don’t know it” (lit. “you cannot know it”)

Stative passives of transitive verbs also keep the feature Trans=Yes.

Transitivity of the clause can easily be deduced form the presence of the Person[obj] feature, e.g. (same example in non-potential form; the object marker is missing in the verb, but the feature is present anyway):

bdərueiṭ Dyn=YesGender[subj]=2||Number[subj]=Sing|Person[obj]=3|Person[subj]=2|Tense=Pres|Trans=Yes|VerbForm=Fin
“you know it”

I hope this makes sense.

rueter commented 1 week ago

Hi @Stormur

In the Hungarian examples you give: (1) tanítani 'to teach' / (2) tanulni 'to study', both can take an object or not take an object.

What other features would you like?

Indefite :    tanulok   tanul  VERB  _  Number[subj]=Sing|Person[subj]=1
Definite :    tanulom   tanul  VERB  _  Number[obj]=Plur,Sing|Number[subj]=Sing|Person[obj]=3|Person[subj]=1

On the one hand, we could look for dependents and assume that the indefinite conjugation without a dependent is intransitive by default, on the other, the definite is always transitive regardless of whether there is a dependent present.

@ftyers what do you think?

Stormur commented 1 week ago

Hi @Stormur

In the Hungarian examples you give: (1) tanítani 'to teach' / (2) tanulni 'to study', both can take an object or not take an object.

What other features would you like?

Indefite :    tanulok   tanul  VERB  _  Number[subj]=Sing|Person[subj]=1
Definite :    tanulom   tanul  VERB  _  Number[obj]=Plur,Sing|Number[subj]=Sing|Person[obj]=3|Person[subj]=1

On the one hand, we could look for dependents and assume that the indefinite conjugation without a dependent is intransitive by default, on the other, the definite is always transitive regardless of whether there is a dependent present.

Hi!

It was a more general consideration. These suffixes seem to relate to valency, so indirectly to transitivity. You have tan-ul- which represents an "intransitive" process of learning, and then the causative tan-ít- 'to make learn = to teach'.

Then, the fact that a verb like study can take or develop a direct object with time seems again to stress that what is relevant for transitivity is the argument structure of the clause and that it is moot to mark something aprioristically on the verb. If the marking of transitivity mechanically depends on the detection of dependents, it is not useful (it is contextual annotation).

I mean, I do not think it is possible to speak of "intransitive" or "transitive" conjugations. There might be a correlation, though.

Definite : tanulom tanul VERB _ Number[obj]=Plur,Sing|Number[subj]=Sing|Person[obj]=3|Person[subj]=1

If the Number of the object is not inferrable, it should not be annotated: it is simply not expressed.

Stormur commented 1 week ago

In the Abkhaz treebank, the feature Trans (Yes/No) is used to mark the transitivity of the underlying base verb. That is, an active verb not in the potential form is marked Trans=Yes if it has a direct object marker (which is not always overt) and ergative morphology, and the dependency relation to the object is accordingly «obj». Trans=Yes is also kept in the potential form, where the direct object is morphologically marked as (intransitive) subject (with relation «nsubj»), and the subject is demoted to a type of indirect object, with a special marker in the verb (relation «obj:po»). E.g.,

bəzdərʒom Dyn=Yes|Gender[po]=2|Number[po]=Sing|Person[po]=2|Person[subj]=3|Polarity=Neg|Reln=Pot|Tense=Pres|Trans=Yes|VerbForm=Fin
“you.fem don’t know it” (lit. “you cannot know it”)

Stative passives of transitive verbs also keep the feature Trans=Yes.

Transitivity of the clause can easily be deduced form the presence of the Person[obj] feature, e.g. (same example in non-potential form; the object marker is missing in the verb, but the feature is present anyway):

bdərueiṭ Dyn=YesGender[subj]=2||Number[subj]=Sing|Person[obj]=3|Person[subj]=2|Tense=Pres|Trans=Yes|VerbForm=Fin
“you know it”

I hope this makes sense.

I have to wrap my head around this, but does it mean that the Trans marker is always tied to some object (or else) marker? Or is it expressed in an identifiable, independent morphological way?

rueter commented 1 week ago

Hi @Stormur In the Hungarian examples you give: (1) tanítani 'to teach' / (2) tanulni 'to study', both can take an object or not take an object. What other features would you like?

Indefite :    tanulok   tanul  VERB  _  Number[subj]=Sing|Person[subj]=1
Definite :    tanulom   tanul  VERB  _  Number[obj]=Plur,Sing|Number[subj]=Sing|Person[obj]=3|Person[subj]=1

On the one hand, we could look for dependents and assume that the indefinite conjugation without a dependent is intransitive by default, on the other, the definite is always transitive regardless of whether there is a dependent present.

Hi!

It was a more general consideration. These suffixes seem to relate to valency, so indirectly to transitivity. You have tan-ul- which represents an "intransitive" process of learning, and then the causative tan-ít- 'to make learn = to teach'.

Then, the fact that a verb like study can take or develop a direct object with time seems again to stress that what is relevant for transitivity is the argument structure of the clause and that it is moot to mark something aprioristically on the verb. If the marking of transitivity mechanically depends on the detection of dependents, it is not useful (it is contextual annotation).

I mean, I do not think it is possible to speak of "intransitive" or "transitive" conjugations. There might be a correlation, though.

Definite : tanulom tanul VERB _ Number[obj]=Plur,Sing|Number[subj]=Sing|Person[obj]=3|Person[subj]=1

If the Number of the object is not inferrable, it should not be annotated: it is simply not expressed.

Thank you, @Stormur !

My problem with the Hungarian was that both tanít teach' and _tanul_study' can be understood as intransitives, i.e., they could answer the question. «What does Jo do nowadays?» (1a) Jo tanít. Jo teaches' or (1b) _Jo tanul._Jo studies'. Hence, both can represent "intransitive" processes.

(2a) Jo tanít egy idegen nyelvet. Jo teaches a foreign language' or (2b) _Jo tanul egy idegen nyelvet._Jo studies a foreign language'. In both situations Jo is doing something indefinite, as it were. The object is present and this would be Transitive=Yes

(3a) Jo tanítja ezt a nyelvet. Jo teaches this language' or (3b) _Jo tanulja ezt a nyelvet._Jo studies this language'. In both situations Jo is doing something definite. The object is present and this would be Transitive=Yes

In (1a-1b) we have an intransitive sentence. In (2a-2b) we have an transitive sentence with an indefinite object. In (3a-3b) we have an transitive sentence with a definite object.

My question then is whether we should mark the sentences (4a-4b) as Transitive=Yes (4a) Jo tanítja. Jo teaches it' or (4b) _Jo tanulja._Jo studies it'.

The explicit object has been elided, but the existence of that object is retained in the verbal conjugation. Ellipsis. The object can be extracted from the preceding context.

I can appreciate the reasoning for removing the content «Number[obj]=Plur,Sing» if we know that number is binary for a given language.

Definite : tanulom tanul VERB _ Number[obj]=Plur,Sing|Number[subj]=Sing|Person[obj]=3|Person[subj]=1

If the Number of the object is not inferrable, it should not be annotated: it is simply not expressed.

In the UD context this is very good and lightens the otherwise packed features column for UD_Erzya at least.

Stormur commented 1 week ago

I would say that a purely, "a priori" semantic annotation of a verb as intransitive or not based on the process it represents is not something we want at the layer of morphosyntactic annotation, certainly not if such intransitivity is not marked in an explicit, regular way (if it ever happens).

I could however conceive an additional sentence layer where the transitivty of a whole clause is annotated, but to put this as a morphological marker on a verb is misleading. This is because if Transitivity=Yes does not correspond to anything observable, to annotate it on a verb amounts to saying that that verb is always transitive, just like it could e.g. be of a given inflectional class, but as you show this leads to contradictions.

When speaking about "processes", in fact, it seems to me we are actually referring to Aktionsart (or lexical aspect), which can be correlated to transitivity (and then we would be interested in annotating Aktionsart to observe how these two feature pattern), but does not implicate its presence or absence. In fact I would say that the data tell us that, given two actors, every process can be represented as a transitive clause, depending on different factors. Therefore I feel it is totally moot to annotate transitivity somewhere if in the end we just look at the presence or absence of given arguments/markers.

That said, I see the practical problem of retrieving an obj which can either be in a dependency relation or in a layer of the predicate.

paulmeurer commented 1 week ago

No, the Trans marker is not necessarily tied to some object or other marker.

In most forms in the paradigm of a verb, you can read off the transitivity from the presence of a slot III person/number/gender marker, but, as I pointed out, in stative passives and in potential forms, the argument syntax is changed and isn’t transitive any longer. So Trans in a stative passive form would mean that it is a stative passive of an (active) transitive verb.

In theory you could derive this information from the other features (Person[obj],Reln, Voice) by some non-straightforward rules. But I think it is very convenient to have this information explicitly available as a feature. This makes it much easier to distinguish between homographs that differ in transitivity (and have totally unrelated meaning, which is very common in Abkhaz). In a treebank search setting, it also helps you to search e.g. for all transitive verbs, regardless of whether they occur negated or not (negation often entails potentialis with loss of transitivity in the clause).

In contrast to the Hungarian case («tanul», «tanít» used both transitively and intransitively with virtually equal morphology), an Abkhaz verb is either transitive or intransitive in its non-potential non-passive forms, and they exhibit very different morphology. (There are some exceptions of labile verbs (e.g. «write»), but those are shorthands for pairs of intransitive and transitive verbs that share the masdar (which is the dictionary entry form), and again have very different morphology except for the masdar.)

In practical terms, in the morphological analyzer, this feature is taken from the lexicon entry of the verb. (Transitivity is regarded as a salient feature in the Abkhaz lexicographic tradition and given for each verb entry.)

Stormur commented 1 week ago
bəzdərʒom Dyn=Yes|Gender[po]=2|Number[po]=Sing|Person[po]=2|Person[subj]=3|Polarity=Neg|Reln=Pot|Tense=Pres|Trans=Yes|VerbForm=Fin
“you.fem don’t know it” (lit. “you cannot know it”)
bdərueiṭ Dyn=Yes|Gender[subj]=2||Number[subj]=Sing|Person[obj]=3|Person[subj]=2|Tense=Pres|Trans=Yes|VerbForm=Fin
“you know it”

I find really challenging to understand how Caucasian languages work :slightly_smiling_face:

In these examples, Gender[] has value 2, but I can only find Fem, Masc, Neut in the guidelines... is it correct?

Also, from observing the features, is it correct to say that Abkhaz expresses potential by means of a passive or impersonal construction? That is, we are translating it with an English transitive clause "you don't know it", but it is actually "it cannot be known by you"?

Can also Trans=No verbs form stative passives or potentials?

Is homography common also at the level of lemmas?

I think we have to understand what it means to "search for transitive verbs", as in this context this seems to be a very Abkhaz-specific, lexical, semantic classification, and so I fear that an apparently very generic, syntactic label as Trans is misleading.

an Abkhaz verb is either transitive or intransitive in its non-potential non-passive forms, and they exhibit very different morphology.

Does this mean that we can speak of different inflectional classes? So could one envision to use a feature like InflClass with an Abkhaz-specific value? Are there cases of the same root taking both transitive and intransitive morphological patterns?

paulmeurer commented 1 week ago

Sorry, yes, there was obviously an error in the examples; I had manually changed a different example. Here is a freshly generated analysis:

бдыруеит    а-ды́рра    VERB Dyn=Yes|Gender[subj]=Fem|Number[subj]=Sing|Person[obj]=3|Person[subj]=2|Tense=Pres|Trans=Yes|VerbForm=Fin
быздырӡом   а-ды́рра    VERB Dyn=Yes|Gender[po]=Fem|Number[po]=Sing|Person[po]=2|Person[subj]=3|Polarity=Neg|Reln=Pot|Tense=Pres|Trans=Yes|VerbForm=Fin

Yes, you could characterize the potential as an impersonal construction that could be literally translated as «it cannot be known by you».

Intransitive verbs cannot form stative passives (but there are stative verbs that are not derived from transitives). They can however form potentials, e.g. «I cannot stand»:

сызгылом    а-гы́лара VERB Dyn=Yes|Number[subj]=Sing|Person[subj]=1|Polarity=Neg|Reln=Pot|Tense=Pres|Trans=No|VerbForm=Fin

Here, the argument syntax is not altered.

Homography is very common at the level of lemmas. E.g. “a-gará” can mean «to take» (trans.), «to die (of hunger, thirst etc.)» (intr. with indirect object), «to sound, to be heared» (intr.). The former two are etymologially related (hunger takes me -> reinterpreted as: I am taken by hunger), the latter is unrelated. There is much homography across word classes also (often semantically unrelated), and still more homography modulo stress position.

I see your point that Trans might be misleading. I think InflClass would be misleading for other reasons, it points more into the direction of Indoeuropean inflections, where each class has a different set of endings to encode the same person/number features, irrespective of syntactic behavior. I would prefer VerbClass, which would also fit the Georgian case, where four verb classes can be distinguished, according to the verbs’ argument case syntax. For Abkhaz, one could posit the classes Intr, Trans, and perhaps Inv (for verbs with what is traditionally called inverted syntax, where the subject is marked by the indirect object marker, and the object by the subject marker; mostly verba sentiendi). As I mentioned, there are labile verbs that can be inflected according to both the intransitive (nominative) and transitive (ergative) patterns.

paulmeurer commented 1 week ago

I see now that the Subcat feature that @dan-zeman mentiones might be fitting for Abkhaz, at least if it is used similar to its use in the Georgian tagset, which distinguishes between Intr, Tran and Indir. For Georgian, these feature values don’t cover all subcatigorization possibilities (ditransitive verbs do get Tran, applicative arguments are not considered), and for Abkhaz, it would be hopeless to encode all subcategorization possibilities as atomic feature values, but Intr, Tran and perhaps Inv would work well.