UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

independent possessives cross-linguistically #1009

Open jonorthwash opened 9 months ago

jonorthwash commented 9 months ago

I'm a little bit confused about (1) what the best way is and (2) what current guidelines are for how to treat independent possessives cross-linguistically.

In English, independent possessive pronouns currently appear to be treated as forms of possessive pronouns; e.g. the lemma of mine is my, and the POS tag is PRON.

In a sentence like They cleaned their windows, but didn't clean mine., partially annotated below using what I understand to be current guidelines, the independent possessive further has an information structure problem.

1   They    _   PRON    _   _   2   nsubj   2:nsubj _
2   cleaned _   VERB    _   _   0   root    0:root  _
3   their   _   PRON    _   _   4   det 4:det   _
4   windows window  NOUN    _   Number=Plur 2   obj 2:obj   _
5   ,   _   PUNCT   _   _   2   punct   2:punct _
6   but _   CCONJ   _   _   2   cc  2:cc    _
7-8 didn't  _   _   _   _   _   _   _   _
7   did _   AUX _   _   9   aux 9:aux   _
8   n't _   ADV _   _   7   advmod  7:advmod    _
9   clean   _   VERB    _   _   2   conj    2:conj  _
10  mine    my  PRON    _   Number=Sing|Person=1|Poss=Yes|PronType=Prs  9   obj 9:obj   _
11  .   _   PUNCT   _   _   9   punct   9:punct _

Specifically, it gives the appearance of a first person singular object of the verb, as opposed to a third person plural object (that happens to have a first person singular possessor). This seems really bad for tasks like information extraction, but perhaps that's not considered a priority here.

I'm also curious how this is dealt with regarding nouns, e.g. in They cleaned their windows, but didn't clean Sam's. (I wasn't able to find any examples in a cursory search of English-EWT, but it was an admittedly quick search.) Here I would imagine that whatever is currently being recommended is going to make it look like Sam is the object of cleaning.

Specific questions:

  1. Is my understanding of current best practice for words like mine correct?
  2. Is this not a problem in the ways I mentioned?
  3. How are independent possessives of nouns dealt with (in English or any language)?

For the sake of transparency, I'm working with the standing UD Turkic group on a very similar issue in Turkic.

Stormur commented 9 months ago

At a first glance it does seem that these Number and Person should be put at the [psor] layer, as usual for possessives. I agree that this current situation produces wrong information.

In a language like English this might appear less necessary at first, but in Latin we forcibly need to distinguish the Number of the possessor because the determiner regularly inflects for Case/Number/Gender (I say "forcibly", but I am also convinced this is the correct way to go). Then the logical consequence is that the Person belongs to the same layer. By the way, Latin does not have independent, i.e. pronominal only possessives (nor does Italian, for example), but I think this does not make a difference at the feature level.

They cleaned their windows, but didn't clean Sam's

This is one of those cases which makes me advocate for the presence of an ellipsis in the relation... something along the lines of obj:ellipsis or even obj[ellipsis] (i.e. a layer for dependency relations).

nschneid commented 9 months ago

I'm also curious how this is dealt with regarding nouns, e.g. in They cleaned their windows, but didn't clean Sam's.

This one is easy: per English tokenization guidelines, the genitive ending 's would be a separate token attaching as case. I.e. it is treated as an analytic rather than morphological genitive/possessive. examples

The independent possessives in English are treated under these guidelines. Per this policy, the role of morphological features is to identify a slot in the paradigm. Mine with a singular vs. plural possessum antecedent (e.g. That book is mine vs. Those books are mine) are not distinguished in the English pronoun system; the form depends only on the number associated with the possessor.

The implicit number associated with the possessum may be relevant to semantic interpretation, and even to agreement (Mine is vs. Mine are). So it is arguably a limitation that we don't use "deeper" features here (IMO even a stronger case than you, which English-GUM actually marks as singular or plural based on entity annotation of the antecedent, but English-EWT leaves as unspecified for number; the semantic number is irrelevant to subject-verb agreement). But Mine are can be analogized to Some are, where we tag "Some" as DET and do not assign it any number feature.

In short, we use morphological features fairly minimally in English, as a way to group together word forms into paradigms but not attempting to capture all of the understood distinctions that could have morphosyntactic relevance. A fuller account may call for phrase-level features (e.g. to mark an entire noun phrase as definite or genitive). I don't necessarily see other languages as bound by the way English has done things, though.

dan-zeman commented 9 months ago

The general recommendation is that layered features are used only if several layers of the same-named feature occur within the same word category. So we typically do not need layered Person for pronominal possessives unless there is a language that has a word meaning something like "my they".

Consequently, features have to be interpreted in the context of the word category they are applied to. If one wants to survey persons of objects of verbs in a corpus, taking blindly the Person value of each obj dependent is not the right way to go (BTW, in many languages nouns don't have this feature at all). I can see that it would simplify the task if it could be done, but UD does not prioritize finding the person of objects over other potential tasks :-)

It is not forbidden to use a layered feature even in cases where it is not strictly necessary. But then it has to be defined as a language-specific feature. And obviously, there should be a consensus within the language (or group of related languages) that all treebanks will do it that way. I think that Person[psor] on nouns is an example of such feature that is used in some languages.

sylvainkahane commented 9 months ago

It could be nice to have a more restrictive policy about features if we want UD treebanks to be used for cross-linguistic studies and typology (see also #775). In the case of En. mine, Person=1 should be prohibited. The definition of Person could be revised to avoid this, or at least a recommendation might be added.

Stormur commented 8 months ago

The general recommendation is that layered features are used only if several layers of the same-named feature occur within the same word category. So we typically do not need layered Person for pronominal possessives unless there is a language that has a word meaning something like "my they".

Consequently, features have to be interpreted in the context of the word category they are applied to.

The problem here though is that all these Person-bearing words belong to the PRON category, but the feature seems to refer to different things, and cannot be disambiguated in other ways. It looks as a universal property of possessives, they have a "two-way" reference, i.e. layers in UD.

If one wants to survey persons of objects of verbs in a corpus, taking blindly the Person value of each obj dependent is not the right way to go

Maybe, but then this is truly a fault in the current annotation practices, rather than a lack of awareness by part of the information extractors. It is also something which can be implemented (and is already in many cases) with minimal effort.

amir-zeldes commented 8 months ago

I agree with @dan-zeman - if nouns don't carry a third person feature in a language, I don't see why 'mine' would either.

In the case of En. mine, Person=1 should be prohibited

I think Person=1 is correct here, since "mine" is pronominal, and belongs to the first person paradigm slot. Semantically it is the possessor who is first person, but which participant exactly a person feature points to is always contextual. For a subject pronoun like "she", it indicates the subject's person; for an object "her", the object's; and for a verb, it indicates the person of a totally different participant, which may or may not be realized via a pronoun, or a noun, or something else, which may or may not carry the person feature.

Even between subject and verb, the most typical agreement pattern for person, we can have mismatches: some languages vary between "it is I", "it's me", "it am I" and maybe even "it am me" (not in English of course), and the interpretation of what is the subject/whether it agrees with the verb varies too. We still mark the person based on the morphological category on each word, not based on the semantics. The person feature, like other features, just refers to the existence of a paradigm, but the exact semantic interpretation varies.

sylvainkahane commented 8 months ago

But we annotate syntax, not semantics: mine triggers a third person agreement. The first person feature is a semantic feature associated to the reference. I agree that Person=3 is not required if it is not on nouns, but Person=1is misleading if I analyze the treebank from a syntactic point of view and should be replaced by Person[psor]=1. In the same way, it is strange to have Number=Sing on En. my. If I study English syntax from UD treebanks, I don't want to learn that some English determiners are marked for number. (I agree that my is not analyzed as a determiner in English treebanks, which at least avoid a total mess. But the parallel with other Germanic languages is completely lost, where such words are analyzed as DET with a Number[psor] feature.)

amir-zeldes commented 8 months ago

But we annotate syntax, not semantics

In the case of Person, I think we annotate morphology, not syntax or semantics. Many of the FEATS features have no syntactic reflexes at all, such as PronType, or NumType, which are lexical features. Subject-verb agreement is a relational category, which Person is not. Otherwise, we would not have Person annotations on English object pronouns, since they do not trigger agreement. Saying that "my" is first person just tells us which slot in the pronominal paradigm of English possessives it fills.

As you pointed out, there is no syntactic relation difference between the possessives and other determiners in English (they are interchangeable with determiners, compatible with predeterminers, etc. - "my/the books", "all my/the books"...), and agreement plays no role here. So if we were only annotating syntactic phenomena, "my" should not have Person at all.

nschneid commented 8 months ago

The morphology guidelines overview says

Features are additional pieces of information about the word, its part of speech and morphosyntactic properties. Every feature has the form Name=Value and every word can have any number of features, separated by the vertical bar, as in Gender=Masc|Number=Sing.

Analogically to part-of-speech tags, features describe the word form but not necessarily its exact function in the given sentence. Most of the features are for locating the form in a slot of a morphological paradigm, and are canonical labels for the slot.

The term "morphosyntactic" suggests to me that relational effects of morphology such as agreement would be fair game to be encoded in features—after all, agreement is a prime example of how features such as person, gender, and number are used in linguistic theories like LFG. But the second part of the quote suggests a narrower interpretation (just locating a paradigm slot associated with the word form). Are treebanks consistent in following the narrower interpretation? Is it worth expanding the explanation?

BTW, just noticed there is an out-of-date bit in the guidelines about Voice not being used in English. After much discussion we decided to adopt it: see https://universaldependencies.org/en/feat/Voice.html and UniversalDependencies/UD_English-EWT#290. @dan-zeman can you think of a different example where a feature used in Czech is not used in English because it does not affect the form of the word (though it might apply at the phrase level)?

Stormur commented 8 months ago

I do not think that agreement has any role here.

As far as I understand, the original question is not so much about putting a third person to mine, but about not putting a first person to it, at least not how it is currently done.

Saying that "my" is first person just tells us which slot in the pronominal paradigm of English possessives it fills.

The problem is that while this contextual disambiguation might work (but only to some extent, consider all other morphologically expressed features in other languages overlapping between agreement and possessor) between DET and PRON, it is simply is invisible between PRONs. But there is a substantial difference between mine and me! in one case the Person is "external", in the other "internal". It is an error not to annotate this difference, in my opinion, or at least: it leads to an insufficient representation of what is happening.

dan-zeman commented 8 months ago

BTW, just noticed there is an out-of-date bit in the guidelines about Voice not being used in English. After much discussion we decided to adopt it: see https://universaldependencies.org/en/feat/Voice.html and UniversalDependencies/UD_English-EWT#290. @dan-zeman can you think of a different example where a feature used in Czech is not used in English because it does not affect the form of the word (though it might apply at the phrase level)?

I like the Voice example. Instead of looking for a different feature, I picked a different language :-) see https://github.com/UniversalDependencies/docs/commit/a28d9e6e9e1ac0b0c9ab8353bf6b63fa4618265b.

amir-zeldes commented 8 months ago

As far as I understand, the original question is not so much about putting a third person to mine, but about not putting a first person to it, at least not how it is currently done.

From @jonorthwash saying it gives the appearance of a first person singular object of the verb, as opposed to a third person plural object I understood the idea was to use Person=3. I think "mine" is no more third person than any lexical NP like "the table", so it should not be annotated as 3rd person, but that's fine if we agree that is "off the table" :)

The problem is that while this contextual disambiguation might work (but only to some extent, consider all other morphologically expressed features in other languages overlapping between agreement and possessor) between DET and PRON, it is simply is invisible between PRONs

This is also true of other POS tags, for example verbs with a person feature could stand in for a pro-drop subject or they could just be marking agreement with a subject. In many languages, that agreement is not 1:1 - for example in (Modern Standard) Arabic, a verb agrees with its subject in person but not number if it is placed before the subject, and still FEATS should express the overt morphology. Words like "mine" are 1st person because as a pronominal paradigm, the substitutive possessive does express person, and that is the only difference to "yours" - the distinction is exactly 1st vs. 2nd person.

It is an error not to annotate this difference, in my opinion, or at least: it leads to an insufficient representation of what is happening.

I see your point, and I wouldn't object to annotating it somewhere, but I think the distinction is between attributive and substitutive possessive, not one of person. The German xpos tag set (STTS) makes this distinction, where "my" is tagged PPOSAT (pronoun, possessive, attributive) and "mine" is tagged PPOSS (pronoun, possessive, substitutive). Maybe this could be added to PronType or an additional value of Poss (which is currently just "Yes"), or some other feature. It would be easy to add something at least in English, since the form coupled with other annotations unambiguously identifies this paradigm.

Stormur commented 8 months ago

As far as I understand, the original question is not so much about putting a third person to mine, but about not putting a first person to it, at least not how it is currently done.

From @jonorthwash saying it gives the appearance of a first person singular object of the verb, as opposed to a third person plural object I understood the idea was to use Person=3. I think "mine" is no more third person than any lexical NP like "the table", so it should not be annotated as 3rd person, but that's fine if we agree that is "off the table" :)

I would also agree that a Person=3 marking is not appropriate in this case. Actually, this would bring me to consider if Person=3 in general makes sense at all, as it looks very much as a "value by negation", i.e. the person which is neither 1st nor 2nd... but maybe this makes us stride too far now? :thinking:

It is an error not to annotate this difference, in my opinion, or at least: it leads to an insufficient representation of what is happening.

I see your point, and I wouldn't object to annotating it somewhere, but I think the distinction is between attributive and substitutive possessive, not one of person. The German xpos tag set (STTS) makes this distinction, where "my" is tagged PPOSAT (pronoun, possessive, attributive) and "mine" is tagged PPOSS (pronoun, possessive, substitutive). Maybe this could be added to PronType or an additional value of Poss (which is currently just "Yes"), or some other feature. It would be easy to add something at least in English, since the form coupled with other annotations unambiguously identifies this paradigm.

Don't you think that Poss=Yes already does the job? As hinted by @sylvainkahane , if I understood correctly, the presence of this marker could enforce the request for some features like Person being present only at a [psor] level (with a slight redundancy).

The problem is that while this contextual disambiguation might work (but only to some extent, consider all other morphologically expressed features in other languages overlapping between agreement and possessor) between DET and PRON, it is simply is invisible between PRONs

This is also true of other POS tags, for example verbs with a person feature could stand in for a pro-drop subject or they could just be marking agreement with a subject. In many languages, that agreement is not 1:1 - for example in (Modern Standard) Arabic, a verb agrees with its subject in person but not number if it is placed before the subject, and still FEATS should express the overt morphology. Words like "mine" are 1st person because as a pronominal paradigm, the substitutive possessive does express person, and that is the only difference to "yours" - the distinction is exactly 1st vs. 2nd person.

Hm, the issue is subtle here. Honestly, I fail to see the difference between standing in for a "pro-drop" (for the n-th time I express my skepticism about this terminology :grimacing:) and expression of agreement. I would say that person marking is always a reference to a subject, however this is expressed, and each language marks what it deems (strictly) necessary or sometimes semantically motivated. I do not see a problem in Person here, because I think we are dealing with the same phenomenon, and the referent is always the same. There is no feature polysemy.

If the paradigm of mine is that of my (as implied by the annotation in the first post), then surely Person is a parameter in it, but it is already at the possessor level because of the lexical nature of these words. This is probably the motivation not to put all these elements under I, correct? Then one should better motivate how and how much Person is transversal to me and my.

jonorthwash commented 8 months ago

I'm also curious how this is dealt with regarding nouns, e.g. in They cleaned their windows, but didn't clean Sam's.

This one is easy: per English tokenization guidelines, the genitive ending 's would be a separate token attaching as case. I.e. it is treated as an analytic rather than morphological genitive/possessive.

This doesn't address my question. I'm not concerned about how 's is treated, but how the form Sam's is dealt with more generally. How do information extraction tasks know that Sam isn't the object of clean (that happens to be marked with case via 's)? How can we clarify that there are two participants there, one of which isn't overtly expressed (the windows)? These are the sorts of questions I was hoping could be addressed.

Person=1is misleading if I analyze the treebank from a syntactic point of view and should be replaced by Person[psor]=1.

This is an excellent suggestion.

Words like "mine" are 1st person because as a pronominal paradigm, the substitutive possessive does express person, and that is the only difference to "yours" - the distinction is exactly 1st vs. 2nd person.

No one is trying to push the view that "mine" isn't first person. The debate at this point is how to indicate that part of what it's doing. @amir-zeldes, what's wrong with @sylvainkahane's suggestion to use Person[psor] to indicate the person of such forms?

Are treebanks consistent in following the narrower interpretation? Is it worth expanding the explanation?

Yes, @nschneid, I think the quoted documentation is inconsistent enough that it should be fixed. I had assumed (and had assumed that everyone else assumes) that relational effects of morphology are one of the main reasons to be annotating morphological features.

For example, grammatical gender in many Romance, Germanic, Slavic, etc. languages is important because of how adjectives, determiners, numbers, etc. have to agree with nouns (and you annotate both with the same feature, even though it's a lexical property of nouns and a grammatical form of adjectives and crew).

I guess tense is a counterexample: it's more about the paradigm block and less about something that has some relation to other parts of a sentence (except perhaps the lemma of a given time adverb).

To set the record straight, I'm not advocating to annotate all nouns as Person=3. I can see why it might make sense, but it also seems unnecessary—in languages that care about person in the morphology, almost all nouns act like 3rd person. But when we're talking about pronouns (which I think we all agree that mine is, even if we seem to not see eye-to-eye about how it works), I think it's important to think about what person the referent is.

So we typically do not need layered Person for pronominal possessives unless there is a language that has a word meaning something like "my they".

I don't think it's particularly relevant to what I'm asking, but Kyrgyz does have forms like this—e.g., аным "that one of mine", or more literally "my it". I'm not sure the plural works in Kyrgyz, but I've encountered a good handful of convincing examples of it in Kazakh (оларым "those of mine", or literally "my them"). I don't know that layers are the right solution here, since there's only one participant here; you can simply annotate it as Person=3 and Person[psor]=1, just like you would with a possessed noun.

With mine, it would be the same: Person=3 and Person[psor]=1, except then having any reasonable lemma (I, my) would get confusing—these are first-person lemmas (lexically specified as first person), with a third-person referent.

This is where I put my cards on the table. I think the sanest solution for this is an extra token (with empty form and lemma—although the "ne" could be used to fill the form maybe). Something like this, and similar for Sam's (except as nmod:poss instead of det etc.):

10-11   mine    _   _   _   _   _   _   _   _
10  mine    my  DET _   Number=Sing|Person=1    11  det _   _
11  _   _   NOUN    _   _   9   obj _   _
12  .   _   PUNCT   _   _   9   punct   _   _

I know this isn't going to happen, but it makes more sense (to me) for many downstream tasks, not to mention linguistically, than anything else I've seen so far.

dan-zeman commented 8 months ago

I don't think it's particularly relevant to what I'm asking, but Kyrgyz does have forms like this—e.g., аным "that one of mine", or more literally "my it". I'm not sure the plural works in Kyrgyz, but I've encountered a good handful of convincing examples of it in Kazakh (оларым "those of mine", or literally "my them"). I don't know that layers are the right solution here, since there's only one participant here; you can simply annotate it as Person=3 and Person[psor]=1, just like you would with a possessed noun.

With mine, it would be the same: Person=3 and Person[psor]=1, except then having any reasonable lemma (I, my) would get confusing—these are first-person lemmas (lexically specified as first person), with a third-person referent.

Yes, this was my line of thinking when I hypothesized "my they". As you suggest, I would use Person=3 and Person[psor]=1 in the Kazakh / Kyrgyz examples above.

This is where I put my cards on the table. I think the sanest solution for this is an extra token (with empty form and lemma—although the "ne" could be used to fill the form maybe). Something like this, and similar for Sam's (except as nmod:poss instead of det etc.):

10-11 mine    _   _   _   _   _   _   _   _
10    mine    my  DET _   Number=Sing|Person=1    11  det _   _
11    _   _   NOUN    _   _   9   obj _   _
12    .   _   PUNCT   _   _   9   punct   _   _

I know this isn't going to happen, but it makes more sense (to me) for many downstream tasks, not to mention linguistically, than anything else I've seen so far.

Yep, this is surely not allowed in basic UD (I can imagine it in enhanced UD but even there it is not part of the current guidelines and would have to be proposed as a new extension). But if this is the underlying structure, then the standard UD treatment of ellipsis will promote mine to the position of the missing object, give it the obj relation while not touching its morphological features. So you essentially end up with what the English UD has now. (The same thing happens with Sam's.)

I agree that basic UD treatment of ellipsis is not particularly helpful for information extraction. It never tried to be.

amir-zeldes commented 8 months ago

This seems really bad for tasks like information extraction, but perhaps that's not considered a priority here.

Indeed, and as @dan-zeman confirmed, this is not the guiding criterion for UD, and in any case, it would not be possible to do justice to this and keep the principle that, for example, names should have compositional analyses internally, because an English genitive 's can either be part of the denotation's referent or not. Two examples from GUM:

2   we  we  PRON    PRP Case=Nom|Number=Plur|Person=1|PronType=Prs  3   nsubj   3:nsubj Entity=(8-person-giv:inact-cf2-1-ana)
3   pass    pass    VERB    VBP Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin   9   advcl   9:advcl:when    _
4-5 Spencer’s   _   _   _   _   _   _   _   _
4   Spencer Spencer PROPN   NNP Number=Sing 3   obj 3:obj   Entity=(13-organization-giv:inact-cf3-1-coref-Spencer_Gifts|MSeg=Spenc-er
5   ’s  's  PART    POS _   4   case    4:case  Entity=13)

Here Spencer's is the name of a store, and synchronically not something belonging to someone called Spencer, which is also indicated in the Entity annotaton in MISC (organization, encompassing nodes 4-5). Similarly:

1-2 She’s   _   _   _   _   _   _   _   _
1   She she PRON    PRP Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs   3   nsubj   3:nsubj Discourse=context-background:21->19:1:ref-prs-130-131,142|Entity=(3-person-giv:act-cf1*-1-ana)
2   ’s  have    AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3   aux 3:aux   _
3   got get VERB    VBN Tense=Past|VerbForm=Part    0   root    0:root  _
4-5 Alzheimer’s _   _   _   _   _   _   _   SpaceAfter=No
4   Alzheimer   Alzheimer   PROPN   NNP Number=Sing 3   obj 3:obj   Entity=(26-abstract-new-cf2-1-sgl-Alzheimer's_disease
5   ’s  's  PART    POS _   4   case    4:case  Entity=26)

Again, Alzheimer's is an entity and the object of the matrix verb. I think what you are looking for in order to figure out the exact participants might be something like PropBank annotations, which can also be merged with UD (see Universal PropBank), or if you use Entity annotations like the ones above you can get at the more Semantic level. But from the syntactic perspective, as Dan said, promotion applies here, so the possessor head represents the entire phrase, and this somewhat obfuscates argument structure (this is true of other types of promotion as well).

@amir-zeldes, what's wrong with @sylvainkahane's suggestion to use Person[psor] to indicate the person of such forms?

I wouldn't say there's anything wrong with it, though if it were done in English, it should also apply to "my" etc. I suspect the reason this isn't done in English, as in most UD languages, is that this kind of possessive pronoun has only one Person value, so using layered features was considered unnecessary. In languages where possessives have paradigms expressing both the possessor and possessed person it makes more sense to have two different keys for those properties. But if there is momentum to change possessives to always use Person[psor] in Indo-European languages, we could absolutely do that. It's just that I wouldn't want to do it only for English then, I think that would do more harm than good for cross-linguistic comparability in UD.

Stormur commented 8 months ago

The small step that I am convinced could be done and would help enormously (relieving from most of the "problems" discussed above) is to signal somehow that there is an ellipsis. We do not need enhanced dependencies for this: just knowing that there is one helps a lot when sifting through data. I am also constantly encountering problems with elliptical constructions at any level and of any kind when I perform data extraction.

As for the layer [psor], I do not see how it cannot be mandatory for the kinds of phenomena discussed here. It is just a matter of complete annotation and we have all the evidences to define that well.

jonorthwash commented 8 months ago

As for the layer [psor], I do not see how it cannot be mandatory for the kinds of phenomena discussed here. It is just a matter of complete annotation and we have all the evidences to define that well.

My point was that labelling mine as Person=1 is misleading. It makes it seem like e.g. an object is 1st person, when in fact it's 3rd person, but possessed by 1st person.

From the discussion, I believe I understand where UD guidelines currently fall on this issue. I also believe I might be able to do what I want with enhanced dependencies, which might make everyone happy. Something like this (but for a non-IE language, so maybe Case=Gen or something instead of Poss=Yes):

10  mine    my  PRON    _   Number=Sing|Person=1|Poss=Yes|PronType=Prs  9   obj 10.1:det    _
10.1    _   _   NOUN    _   _   9   obj 9:obj   _

The problem is that now we have a PRON functioning as if it were a DET. I see a couple ways to deal with that, but don't like any of them. With a proper noun, this problem goes away.

Stormur commented 8 months ago

The problem is that now we have a PRON functioning as if it were a DET

This might not be a problem, it does happen and it is even part of the guidelines!

ftyers commented 7 months ago

My point was that labelling mine as Person=1 is misleading. It makes it seem like e.g. an object is 1st person, when in fact it's 3rd person, but possessed by 1st person.

I don't think it is necessarily misleading, just that the encoding in English and in Turkic would be different, as is appropriate given their different structures. English doesn't have possessive affixes, so the Person agreement needs to be interpreted with reference to the POS and the other features (e.g. Poss=Yes).

In languages that have possessive affixes, generally Person doesn't refer to possession, but to agreement. In Turkish the equivalent of Poss=Yes for marking this switch appears to be PronType=Prs|Case=Gen:

4   benim   ben PRON    Pers    Case=Gen|Number=Sing|Person=1|PronType=Prs  6   nmod:poss   _   _

Although there are some interesting issues here (this one looks like a mistake):

14  benimle ben PRON    Pers    Case=Ins|Number=Sing|Number[psor]=Sing|Person=3|Person[psor]=1  17  obl _   _

I guess this is because of how copula agreement is expressed in Turkish, so that e.g. "you are our teacher"

9   hocamızsınız    hoca    NOUN    _   Case=Nom|Number=Plur|Number[psor]=Plur|Person=2|Person[psor]=1  2   conj    _   _

There are additional issues, in that the demonstrative pronouns (ol, ал etc.) can take possessives, but the personal pronouns cannot (except maybe the 3rd person ones, which in any case could be considered demonstratives).

So in that case the Person in benim would refer to the possessive by virtue of being PronType=Prs and Case=Gen because personal pronouns cannot take possessive suffixes (e.g. beniniz "your.PL me" does not work).

What I think is currently the case:

This does not seem too bad. There is inconsistency in that there are special rules for personal pronouns, but let's consider some other options:

There are also things like benimkinden "from my ones, from the ones of mine", seninkini " your ones, the ones of yours" etc. (and maybe seninkiyim(?) "i am your one")

ftyers commented 7 months ago

So the issue with seninkiyim is that there are three persons in one word¹:

This isn't strictly to do with possession, it's more about subject agreement. But in any case, I think we already split off -ki- because of double Case, e.g. you can have seninkinden where you would have both Case=Gen and Case=Abl.

If the idea is that you could improve the analysis by making the Person=2, instead be Person[psor]=2, I think I'd disagree, the problem would still remain, I think the solution is to either have three tokens, or suppress the Person=3 on -ki and just make it

(I think this is maybe a different issue though)

¹ No, not those...

jonorthwash commented 7 months ago

I think the solution is to either have three tokens,

Exactly.

or suppress the Person=3 on -ki and just make it

  • senin = Person=2
  • -kiyim = Person=1

(I think this is maybe a different issue though)

The copula-related issue (the 1st person here) is a separate question, yes. But seninki "yours" is exactly what this issue is about. It's a 3rd person pronoun possessed by a 2nd person possessor.

So based on the parts above that I copied, you might get something like this (assuming token 3 is something like olacağım "I will be" for purposes of token 2's relations):

1-2 seninki _   _   _   _   _   _   _   _
1   senin   sen PRON    _   Number=Sing|Person=2|Case=Gen   2   obj _   _
2   ki  ki  PRON    _   Number=Sing|Person=3|Case=Nom   3   xcomp   _   _

If UD is okay with this, then I'm okay with it. But it's nothing like how this is handled in other languages

ftyers commented 7 months ago

Well, I don't get what is going on with obj there, but maybe this will be clearer:

But there is an interesting alternation here, e.g. arkadaşınızı but -kini (e.g. there is no possessive agreement on the -ki-, so it's not exactly the same.

If you have a copula (you could also switch out cop for xcomp if you want the "be/become" verb to be the root, but the xcomp would be xcomp(olacağım, ki))

Of course, having a morph be the root might be distasteful for some...

ftyers commented 7 months ago

There is another option, which I don't favour, but include for completeness, which would be something like:

This would basically lexicalise any word with -ki- into DET/PRON pairs, and would make it similar to how it is done in English.

Stormur commented 7 months ago

So the issue with seninkiyim is that there are three persons in one word¹:

* sen = `Person=2`

* -in = `Case=Gen`

* -ki- = `Person=3`

* -yim = `Person=1`

May I ask what the -ki- suffix is and why it is assigned a third person in your analysis? Is it a "noiminaliser", like (if I remember well) in Mongolian?