Open Stormur opened 3 years ago
The noun classes in Niger-Congo languages, as I understand them, are something else than inflectional classes in Indo-European languages. While they do correspond to nominal inflection at least in Bantu, they are also a property of nouns that can be cross-referenced by other words, most notably verbs. That makes them important in the language system, beyond morphology.
BTW we have copied the values Bantu1-Bantu23
from the specification of UniMorph but they have not been used in a UD treebank yet (we still don't have Bantu languages). In contrast, there are also 12 classes for Wolof (Wol1-Wol12
), and these have been applied to real data.
OK. This is somewhat unfortunate, because class is really the right generic term for inflectional paradigms.
Do you have other naming suggestions? What do you think of:
Declension
for nouns, of course keeping the proposed Indo-Europeans values + Ind
(or maybe a variant of Uninflecting
to avoid overlaps; but this label is intended for those words that, contrary to the prototypical behaviour of their class, do not show any morphological variations)Conjugation
for verbsNeither yet exists at the moment.
Why not simply InflClass
?
But I'm a bit skeptical on maintaining one set of values across a family. Perhaps it would be enough to keep the name of the feature cross-linguistically, while the values would always be language-specific. Anyways, we have yet to see how many other people will actually want to use it. For instance, the feature would be perfectly relevant for Czech but I don't have the data in the original annotation and I'm not going to try to obtain it.
Why not simply
InflClass
?
There should be a distinction between verbal and nominal inflections.
But I'm a bit skeptical on maintaining one set of values across a family. Perhaps it would be enough to keep the name of the feature cross-linguistically, while the values would always be language-specific. Anyways, we have yet to see how many other people will actually want to use it. For instance, the feature would be perfectly relevant for Czech but I don't have the data in the original annotation and I'm not going to try to obtain it.
I think it is better to open it to the widest possible applications! :slightly_smiling_face: We have this difficulty for some Latin treebanks, too, but I feel that at least the potential of having such a feature is a good thing.
PS: I think the universal label is still valid!
Why not simply
InflClass
?There should be a distinction between verbal and nominal inflections.
Why? Do you intend to combine both on one word? And what would you do with inflections that are neither nominal nor verbal? If the distinction is necessary, it is also possible to start values of nominal inflection with "N" and verbal with "V".
Yes, as I explain in the first post: inflectional classes might stack! In Latin, this happens for participles:
VerbClass=LatA
-> amatus NounClass=IndEurO
(amati, amatorum...)VerbClass=LatI
-> audiens NounClass=IndEurI
(audientis, audientium...)The fact is that both pieces of information are relevant from an inflectional point of view.
I can imagine that other kinds of stackings can happen in ways that I cannot fathom, of course not limited to verbal conjugation + nominal declension. Probably, also multiple nominal inflectional paradigms can happen at the same time. This was my original motivation for having two features. But it has occurred to me that we may solve this, allowing for as great as possible flexibility, the following way, starting from your proposals:
InflClass
;InflClass[verb]=LatA|InflClass[noun]=IndEurO
(where noun
is used as a label for nominal in general). NIndEurO
for "nominal declensions with o theme", VLatA
for "verbal Latin conjugation with a theme"), because the combination of UPOS and InflClass
is already enough.
I'm probably preceeding @dan-zeman in the quest for standardising features, and hope not to duplicate older issues (I couldn't find others by using the inflection keyword).
So, I was wondering which (language-specific) features could be used to mark inflectional classes of words, and if there exists any already.
It is true that such information may not be purely morphological, but rather lexical, as it is often arbitrary from a synchronic point of view: it is not necessarily dependent on a given part of speech nor gender, even if, at the same time, there exist more or less strong correlations. It is this orthogonality that would motivate such feature, and not secondarily the fact that we often find this information in the Latin treebanks we are taking care of, and it would be a pity to lose it during conversion into the UD standard.
We were experimenting with the already existing NounClass, envisioning a corresponding VerbClass (not yet attested), in parallel to Variant or Form, both already existing, mainly for Czech and Irish.
IndEurO
value, applied e.g. to Latin lupus (lupi, lupo, luporum...) 'wolf' might as well apply to Greek άνθρωπος, Lithuanian miškas (please, IEists correct me if I am wrong), and so on. This would be very nice and desirable for interlinguistic comparisons! So we would haveIndEurA
(rosa),IndEurE
(spes),IndEurI
(possibilis),IndEurO
(lupus),IndEurU
(domus),IndEurX
(i.e. athematic, rex = reg + s), alongside the universal "non-value"Ind
for truly indeclinable (fas, tot).Variant=Greek
for this, but maybe alsoForm=Greek
. TheForeign
feature does not accomplish the same thing here, since these are words which are fully settled in Latin, and we want to highlight a specific paradigmatic variance.LatA
for amo, amare 'to love', and so on, very similarly than for nouns.IndEurO
&LatA
.One thing that brought us towards the for now Bantu-only
NounClass
is that, from its description: 1) we are in the same realm of lexical properties expressed through morphology; 2) a language like Latin already has the independentGender
feature; 3) there is a synchronic unpredictability (as explained for Wolof).What sets Latin classes apart from Bantu ones is that they are not related to concordance phenomena... I don't know how much this is relevant. I have to admit, I am still a little bit confused about the distinction between
NounClass
andGender
, but if they are indeed considered to be separate phenomena, our first concern was to reuse something that was already there. On the contrary, if they aren't, how much possible is it to letGender
become just a subtype ofNounClass
?Other considered features:
NounType
andVerbType
are already taken and are meant for different things; there is a mysteriousUninflect
for uk (Ukrainian?).I am curious to know what the UD community thinks about this, if there are any suggestions or proposals, and if some treebank has already dealt with such issues! :slightly_smiling_face: