UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 246 forks source link

Complex Noun Phrase Inflection #859

Closed rueter closed 2 years ago

rueter commented 2 years ago

In attempts to align terminology used across languages in UD, a recent development entails the preference of one case name over another. In the Komi-Zyrian and Komi-Permyak languages, there are two morphemes present: -тӧг and тӧм, which have the meaning 'without'. They have been presented as two separate cases Caritive and Abessive , respectively. This assignment of names, as @dan-zeman has pointed out to me privately, is in contradiction with the terminology used by Arja Hamari (2011): The abessive in the Permic languages, where she assigns the names in just the opposite direction: -тӧг = Abessive, and тӧм = Caritive. On UD, of course, we would like to diminish the number of case, if at all possible, the validator nowadays only accepts Case=Abe. For those of us using caritive or privative, it is time to correct our documentation to abessive.

Using Hamari's terminology, the "(‹-тӧг› = Abessive) case" is used for modifying a verb, whereas the "(‹тӧм› = Caritive) derivational suffix" is used for marking an adnominal phrase attribute.

But how should we distinguish what is a case and what is a derivational suffix?

To my thinking, the term Case should have a definition, i.e. it is not a Knighthood bestowed by some sublimity, rather, there should be a logical means of distinguishing different varieties of inflection.

Our present definition of UD case is quite flexible: Case is usually an inflectional feature of nouns and, depending on language, other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns.

For lack of extensive documentation in Uralic languages, I have chosen to define case as a formative which is typically attached to a complex-noun-phrase head, but which might also occur on other parts of speech. I define the concept complex-noun-phrase head as a noun phrase with attributes or category marking. Hence, derivation might be assigned to simple noun phrases, i.e. nouns without attributes or extended category marking such as number.

In the Komi language forms, both the abessive тӧг (adverbial modifier) and the caritive тӧм (noun phrase modifier) can be attached to complex noun phrases. This would mean that we have two distinct case formatives with the same "absence of" meaning. The literary languages and many dialects share this distinction, but it is not one hundred percent, as @nikopartanen has pointed out with regard to the Udora dialect of Komi-Zyrian.

First of all there is morphosyntax. (1) The category of Number: Both ‹-тӧг› and ‹-тӧм› can be added to singular and plural nominative stems, e.g. Both ‹документтӧг› [Num=Sing] 'without a document', ‹документтӧм›[Num=Sing] 'without a document' and ‹документъястӧг› [Num=Plur] 'without a documents', ‹документъястӧм [Num=Plur] 'without a documents'.

(1.1) Колипкайлӧн чужан лун / Екатерина Макарова // Коми му (2021-12-16) ... Муніс кӧ, эськӧ и сьӧмтӧг, и [документъястӧг] коли. 'If he went, perhaps he was left without money and [without documents]'

(1.2) Овлісны-вывлісны / Шахов Б. Ф. // Войвыв кодзув (1997. №4) ... Кольӧм воясӧ быд во воллыліс, — дорйис Пиля Педот [документъястӧм] вузасьысьӧс. 'He came back every year during the last years, Pilja Pedot defended the salesperson [without documents].'

(2) Noun with attribute: Nouns inflected with ‹-тӧг› and ‹-тӧм› can take adjectives Here, we are missing a minimal pair, let it suffice for now. [ыджыдысь-ыджыд терпенньӧтӧг]. '[without the greatest of tolerance ]'

Тан жӧ колӧ шуны, мый А. Вежевлӧн, А. Микушевлӧн да мукӧд критикъяслӧн паныдасьлӧны на тэрмасьӧмӧн гижӧм, произведениеяссӧ веркӧссянь видлалысь, [джуджыд анализъястӧм] да обобщениеястӧм статьяяс. 'Here it has to be said that A Vezhev, A Mikushev and other critics are still found to have hastely written articles [without deep analyses] and generalizations that view the works superficially. '

This said, it might be easy to say that the complex noun phrase requirement has been fulfilled, but we must bear in mind that the Komi language forms also form verbs from complex noun phrases. So, whatever our solutions/resolutions might be, we need to look at the larger picture.

There are few formatives used for deriving verbs from nouns worth mentioning here: and v/l. Both are attached the equivalent of the proprietive form of a noun. The word döröm 'shirt' becomes döröma in the proprietive 'having a shirt'. The form döröm-a-ś-ö means 'he/she is putting on a shirt'.

Indeed, the proprietive construction for 'having a big red shirt' is then ïdʒïd görd döröma, and the derived verb 'he/she is putting on a big red shirt' is ïdʒïd görd dörömaśö. @ftyers and @jonorthwash might share my inclination to split the last token dörömaśö into döröma and śö, but maybe that is not a solution that would be shared by other languages.

Reiterating my implicit questions: (a) Is noun phrase complexity a concept we might/should use when distinguishing case from derivation? (b) if the concept, noun phrase complexity, is used for defining case, what do we do when complex NP's become verbs?

Do you have some suggestions? @jnivre @nschneid @amir-zeldes @flammie

dan-zeman commented 2 years ago

Thanks for writing this up and moving the discussion from the commit here!

I sympathize with the wish of having one cross-linguistically robust definition of casehood (and, more generally, the inflection-derivation borderline) but I am skeptical about its practical achievability.

I wouldn't say that a result of morphological derivation cannot head a complex noun phrase. After all, one could also derive a noun from a noun. But that does not mean that it cannot be a criterion in a specific group of languages. Maybe it can, maybe not.

I would look at the various rules within the grammar of the language, and compare it with other languages, the related ones first. So one point that strikes me in Hamari's paper is the claim that Komi -тӧм is cognate (and presumably similar in function) with Finnish -ton/-tön. Would you agree with that claim?

If we accept that assumption, then it is desirable to treat both the same way in UD, unless we can show that their behavior in the two languages diverged so substantially that they need to be treated differently. I speak neither Finnish nor Komi, but this query suggests that the corresponding Finnish words are analyzed as adjectives in TDT (@jmnybl). What I do not see is whether the suffix ever attaches to a plural stem in Finnish, like in Komi? On the other hand, the Finnish adjectives are tagged with Case=Nom and they can take another case suffix, leading to a different value of the Case feature (see this query). Would that be possible in Komi? The prerequisite would be that Komi adjectives in general can take case suffixes; this query suggests that it is possible, although perhaps less frequent than in Finnish (remove the lower(form)~"тӧм" part from the query to see more adjectives).

rueter commented 2 years ago

Hi, the cognates Komi -тӧм and Finnish -ton/-tön both appear as modifiers in the noun phrase, both can occur in the predicate, and when the stem noun is lacking attributes of its own, they both can be compared. (So far, I have not investigated the compatibility of comparison with complex noun phrases in either of the languages. Both languages are known to occasionally take comparative marking on nouns.) While Komi-Zyrian can take the -тӧм formative after singular and plural stems, Finnish can only take the -ton/-tön formative after the base stem, i.e. singular. Allbeit, there is at least on fossil adverb silmitön 'blind (as in blind rage)', which seems to take a plural stem (silm-i-tön 'eye-pl-less'), but this is definitely not part of regular inflection. Adjectives in Finnish agree for case with the NP head in nearly all instances. In Komi, on the contrary, case marking in indicative of an NP head, such that adjectives can take case marking in instances of contextual ellipsis where the NP head noun is dropped.

Looking for a case marker that is used both in the adnominal range and the verbal clause, takes us to the genitive in -лӧн. This case can take additional case marking in Komi, e.g.

Свадьбаныс [[мукӧдлӧн]ысь] торъя гӧль ни озыр эз жӧ ло. in UD_Komi_Zyrian-Lattice: sent_id = IgnatovMI:Medborja_addzysjlom:VK1980:6:44 Their wedding was not specifically poor(er) nor rich(er) than anyone else's. мукӧд-лӧн 'another-gen', which is than augmented with an elative [мукӧдлӧн]-ысь '[another-gen]-ela'.

The Finnish language does not have this contextual ellipsis strategy (aka secondary declension), but it does exist in the Erzya and Moksha languages. In the Mordvin languages, however, there is only one case form -VTOmO used for both noun-phrase and verbal-clause ranges.

rueter commented 2 years ago
Komi_Zyrian-Lattice_secondary_declension_Gen_Ela

Ellipsis where NP head case locus shifts to the Pronoun/Determiner. The Determiner has a genitive marker, but, in the absence of a head noun, the elative case is added directly to the genitive base. In this screen shot, a part of speech has not been added, because it is a ZERO. Pronoun is the first candidate to fill X if we apply the same strategy as in the Erzya and Moksha treebanks.

jmnybl commented 2 years ago

I'm by no mean an expert here, but I can comment on behalf of the TDT annotation. As Dan mentioned, -ton/tön is treated as a derivational suffix transforming a noun into an adjective (e.g. kuumeeton henkilö / a person without fewer). As adjectives agree with case, they will in many cases have a case inflection of their own (e.g. kuumeettomalla henkilöllä, Case=Ade), and therefore treating -ton/tön as a case inflection rather than derivation would cause a double case inflection.

dan-zeman commented 2 years ago

OK. So there are some similarities and some differences between Komi and Finnish, but unfortunately the two languages differ in aspects that could help find the solution. The way I'm reading it is that if we want to treat the suffix differently in Finnish and Komi, we can. (Although it does not mean we have to.)

The first question is whether Komi -тӧм is case inflection or derivation. I don't know. Both analyses seem defendable to me. It would help if we could refer to an authoritative source to support it being a case suffix (the two papers I've seen don't list it), but it is not a requirement, as the UD Komi team should have the right to propose and defend such analysis themselves.

If it is a case, then the second question is whether it must be a new case, or can it be (on semantic grounds) a subtype of the existing abessive case? (Where the other subtype would be the -тӧг morpheme.) Then we could use Case=Abe for both and they could be distinguished, if desirable, by an additional language-specific feature.

rueter commented 2 years ago

Departing from the idea that language kinship at a distance no less than what is found between Slavic and Germanic (Balto-Finnic – Permic), it is not necessary that Finnish and the Komi written languages be compared, although it might serve as a sanity check. A closer relationship might be found in the Mordvin languages, but there the two Komi abessive formatives -тӧг and -тӧм (Bartens 2000, refers to them both as 'caritive' according to Finnish tradition) only have a cognate of the latter -втомо/-втеме/-томо/-теме/-тэме , which covers the function range of both Komi formative. From this we can draw that there is no problem in calling both formatives by the same name. Although it would be necessary that distinction be made somewhere.

As the first question is whether -тӧм is case inflection or derivation, it is important that we refer to the definition of case 'вежлӧг' given by the most extensive grammar of Komi-Zyrian morphology to date, 'Modern Komi Language' 2000, where the authors define it as following: Вежлӧг -- эмакывлӧн грамматика категория, коді петкӧдлӧ ӧти грамматика эмторлысь мӧд грамматика эмтор дорӧ урчитӧм йитӧд. Эмакывлӧн вежлӧг категория -- медся озыр, тӧдчана, сӧвмӧм, форма да вежӧртас боксяньыс водзӧ сӧвмысь категория. Case is a grammatical category that indicates a relation from one grammatical entity to another. The category of case in nouns from an aspect of form and meaning is the richest, most familiar, and most developing category. (Fediuniova et al, 2000: 59)

There is mention that new cases are developing, but the number of cases, 23, which is already five more than the 18 presented in previous grammars, does not include the -тӧм formative. Since, part of this exercise is to help define the concept "dependent case", which might be associated with morphosyntactic features, we must find a definition to fit the status quo if at all possible.

A case in Komi-Zyrian might be defined according to the following: (1) it is a formative with its locus on the noun-phrase head; (2) the noun phrase is complex, i.e., in addition to the head, adnominal attributes are also attested; (3) the noun-phrase head may take marking for the category of number; (4) the type of relations indicated by the case, must include at least one clause-level relation, but can also entail other relations.

Thus, it is only this fourth constraint (not mentioned anywhere but serving as an ad hoc constraint for denying -тӧм casehood) that would provide a feasible break between case inflection and derivation.

Certainly, -тӧм can be used in a copula construction indicating the absence of an entity, and thus be aligned analogously with the copula-type construction for 'belong to', as demonstrated with the Komi genitive -лӧн. One distinction might occur to us, and that is that a nominative (ZERO) form is regularly followed by a plural predicate marker -ӧсь, and this is exactly what happens with NPs ending in the formative -тӧм. Unfortunately, this is not distinctive, as there are also instances where the genitive, inessive and elative can also take the plural predicate formative -ӧсь, which is distinguised from the NP plural formative -яс.

Hence, a reformulation of the fourth criterion is in order. We must change the wording from ...at least one clause-level relation... to at least one verbal clause-level relation...

(4b) the type of relations indicated by the case, must include at least one verbal clause-level relation, but can also entail other relations.

In this way we can maintain the status quo of 23 cases in Komi-Zyrian (2000), while drawing attention to the seemingly ad hoc tradition of case.

Finally, the abessive formative -тӧг is limited in use for indicating relations between elements within the verbal clause, while the formative -тӧм is used both in adnominal attribute functions and non-verbal predication.

What would a language-specific feature look like for distinguishing the two abessive formatives?

essentially then there would only be one difference in the dependencies, correct? As a case: [джуджыд анализъястӧм] да обобщениеястӧм статьяяс. amod(анализъястӧм, джуджыд) nmod(статьяяс, анализъястӧм) conj(анализъястӧм, обобщениеястӧм)

As a derivation: amod(анализъястӧм, джуджыд) amod(статьяяс, анализъястӧм) conj(анализъястӧм, обобщениеястӧм)

rueter commented 2 years ago

In feed back from Evgeni Tsypanov at the Institute of Komi Language, Literature and History to a question concerning extensive and regular complex-noun-phrase-head markers including -тӧм, Doctor Tsypanov says the following:

Традиция серти, тайо» суффиксъяс отсо»го»н артмо»ны кывбердъяс, но В. Таули о»ти уджын –а суффикснас торйо»до» во»л1 торйо»д» мо»д комитатив. Г.Некрасова дорйис колян во докторлысь удж, но с1йо» оз жо» торйо»д выль вежло»гъяс. Дерт, т1янлы позьо» дескриптива мого»н торйо»дны –то»м кыдзи мо»д абессив, -са кыдзи ина генитив, -ся кыдзи када генитив. Суоми ёртъяс тшо»тш мо»впало»ны традиция серти.

«Traditionally, these suffixes [-тӧм, -а, -са, -ся] are used for deriving adjectives, but Valter Tauli in one treatise [1956. “The Origin of Affixes”. Finnisch-Ugrische Forschungen 1956 (32):170–225] has distinguished the suffix -a as a second comitive. G[alina A.] Nekrasova defended her doctor's degree last year [2021], but she does not distinguish new cases. Of course, you, for descriptive purposes, may distinguish -тӧм as a second abessive, -са as a spatial genitive and -ся as a temporal genitive. Finnish colleagues also think according to tradition.»

So, the final question before closing this issue is: «Do we want a descriptive presentation of Komi here in UD, or do we revert to tradition».

@dan-zeman @jnivre

My vote is that we go ahead and call -тӧм abessive as well, but then we will need a language specific feature for their distinction, hopefully sooner than later.

dan-zeman commented 2 years ago

You are right that in the dependency relations, the (almost) only distinction will be whether анализъястӧм “analyses-less” is attached as nmod or amod (which depends on whether it is tagged NOUN or ADJ). There could be some further consequences if the word is modified by a nominal. I understand that джуджыд “deep” is an adjective, and amod connecting two adjectives is probably OK; but if the modifier were a nominal, then it would be nmod when modifying a noun, but obl when modifying an adjective.

The question whether we want a descriptive or traditional presentation of Komi in UD is a difficult one. We want both :-) (that's part of the Manning's law). We try to honor the tradition when possible but we admit that often it is not possible if we want to promote cross-linguistic parallelism.

My impression from this whole thread is that there is nothing clearly wrong about treating -тӧм as a case. Especially if Erzya uses the cognate -втомо in both situations where Komi would use either -тӧм or -тӧг. But I, too, vote for calling both the suffixes abessive.

The language-specific feature distinguishing the two suffixes could be Variant, which has been used in several languages. The name of the feature is very general but the values have to be specific to the language and phenomenon they capture. You could define, e.g., Variant=Clause and Variant=Nomin to refer to the different constituents within which the word can act as a modifier. Or you could simply define Variant=Tog and Variant=Tom; I've seen features like this as well (although I think I'd prefer the former, as they try to describe the function rather than just copying the morpheme).

rueter commented 2 years ago

Thank you for your feedback, Dan. It occurred to me that perhaps Variant=Vclause and Variant=Nomin would be even more descriptive, but I will start with your suggestion of Variant=Clause and Variant=Nomin.

dan-zeman commented 2 years ago

Feel free to use Variant=Vclause if it fits better. There does not seem to be anything similar so far among the values people have used for Variant (otherwise we might want to use a string that has been already used).

rueter commented 2 years ago

Thanks, this a helpful exercise.