UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

Agreeing noun and "pronoun" verb dependents #448

Closed jonorthwash closed 5 years ago

jonorthwash commented 7 years ago

In some languages, "pronouns" are mandatory as verbal dependents even when an overt noun is used. An example of this is Khasi (data shared with me by @nfeldbaum), where nominal subjects agree in person/number/gender with a mandatory "pronoun": An example follows:

U   Blei u    la thaw ia     ka  bneng.
DET God  DET  he PAST create ACC heavens

In this example "U Blei" can be removed and it remains grammatical, but "u la" may not be removed.

A similar pattern exists in Spanish with indirect objects:

Le     dio  el  libro a Juan.
to.him gave the book  to John

In the Spanish UD corpus, these are both marked with an iobj dependency on the verb.

Potentially even more parallel is French subjects:

Mon père   il lit   un livre.
My  father he reads a  book

This is only common in colloquial French, though, and not so much in the literary language (which has a pattern more like English), so I had trouble finding examples in a French UD corpus. (Does anyone have a query that might help my search?) I assume it's not with double nsubj relations (which would be parallel to the Spanish example), since having two subjects goes against UD guidelines.

I understand this set of processes not as apposition or as multiple dependents of the same type, but as a type of agreement where the "morphology" is represented as a separate "word". Since there isn't an explicit dependency relation for person/number/gender verbal specification words, @ftyers suggested that perhaps the relation for TAMVE verbal specification words, aux, might serve us well here—potentially for all three examples above.

If this is not preferred, does anyone have any other recommendation?

TL;DR: What is the dependency relation for a word that specifies person/number/gender of a verbal subject, but acts more like mandatory agreement morphology than like a pronoun?

jnivre commented 7 years ago

For the French case, the guidelines say that "dislocated" should be used for the non-pronominal phrase. (There is an object example in the French guidelines for "dislocated").

For words that act more like mandatory agreement morphology than like a pronoun, there are really no guidelines yet. I think we need a group who can work out guidelines for this whole complex of phenomena, which also includes clitics and pro-drop.

olesar commented 7 years ago

In my view, topicalization/fronting, focus marking etc. are typologically noteworthy phenomena that UD has to take into account. Should we reformulate the definition of "dislocated" then? ... elements that do not fulfill the usual core grammatical relations of a sentence... http://universaldependencies.org/u/dep/dislocated.html Perhaps mentioning information structure functions here? "Doubling" both core and non-core grammatical relations? (Actually, the more I think the more I feel that the very term "dislocated" is misleading).

+1 @jnivre on mandatory agreement morphology.

-- Olga

27.04.2017, 10:39, "Joakim Nivre" notifications@github.com:

For the French case, the guidelines say that "dislocated" should be used for the non-pronominal phrase. (There is an object example in the French guidelines for "dislocated").

For words that act more like mandatory agreement morphology than like a pronoun, there are really no guidelines yet. I think we need a group who can work out guidelines for this whole complex of phenomena, which also includes clitics and pro-drop.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

--  Olga Lyashevskaya

School of Linguistics, Faculty of Humanities & Linguistic Convergence Lab Higher School of Economics, Moscow

Dept. of Corpus Linguistics and Linguistic Poetics Vinogradov Institute of the Russian Language

amir-zeldes commented 7 years ago

We have the same situation very often in Coptic and we use dislocated. I agree the name evokes a 'movement' account, so maybe that's not ideal, but a rose by any other name... In any case +1 for making this explicit in the definition!

jonorthwash commented 7 years ago

The difference between written French and Khasi, Spanish, and colloquial French is that in written French (and many other languages, including English), the "pronoun" is optional. This phenomenon in the other languages is emphatically not fronting or topicalisation or anything else of that ilk—it is simply the way agreement of one sort or other works.

If the proposal is that the definition of dislocated be expanded to include referents of agreement particles that are specified as normal verbal dependents, then that's fine, but I want it to be clear that this is a distinct phenomenon from what the relation is currently described as covering.

jnivre commented 7 years ago

@jonorthwash What do you mean by "referents of agreement particles"? Are these the full phrases that are in some sense co-referential with the pronouns/agreement particles?

The "dislocated" relation, like many other UD relations, is clearly used to cover many different relations (which suggests that subtyping maybe called for). It is used for topic-comment structures in Japanese (where there is no co-referential pronoun or similar) and it is used for topicalisation constructions like "John, I don't like him". The general strategy has been to use the core argument relation for the element most closely linked to the predicate, which is typically the pronoun or pronoun-like element, and to use "dislocated" for the more distant expression (typically a full noun phrase).

However, I maintain that we need to form a subgroup that can study all of the relevant constructions together and come up with a consistent and typologically well-motivated strategy for dealing with them. I intend to initiate such discussions as soon as the dust has settled from the shared task and workshop that is consuming most of the time for some of us right now.

amir-zeldes commented 7 years ago

@jonorthwash thanks for raising this point! I've thought about this a lot too. I agree that linguistically the 100% optional and 100% obligatory situations are very different, but there is something of a continuum between them that makes things murky sometimes. Coptic, like other Afro-Asiatic languages of North Africa, has a strong tendency to prefer this construction (or a second one with right dislocation), but it is not yet obligatory. It's a gradual process, that's less sweeping in earlier texts and progresses later on.

In a language like Hausa (Chadic), it is already obligatory, and you actually have to specify a 'second' strong topicalized pronoun if you want an emphatic pronominal subject (shi, ya ... meaning HE, he did...). This is a clear indication that the historical dislocated reading is gone, and we are now dealing with an agreement marker. Coptic can do this too (ntof de af ... but HE, he did...), but it's still possible (yet dispreffered) to insert a nominal subject with no pronoun at all. So even though the presence of 'double' pronouns is a strong indicator, it does not overlap completely with obligatoriness.

jnivre commented 7 years ago

One problem with all annotation is that we have apply sharp boundaries even in cases where there is a continuum, or a gradually ongoing process of grammaticalization. Ideally, the guidelines should be sensitive to this and specify how to annotate the different stages of development. Sort of like: If completely optional, use annotation A. If completely obligatory, use annotation B. If somewhere in-between, make an informed decision about which of A and B is most misleading (or use an intermediate annotation C).

jonorthwash commented 7 years ago

What do you mean by "referents of agreement particles"? Are these the full phrases that are in some sense co-referential with the pronouns/agreement particles?

Yes.

The "dislocated" relation, like many other UD relations, is clearly used to cover many different relations

None of the possibilities you listed are the same as what I'm talking about, though some are a little reminiscent. It sounds like you understand what I'm talking about generally, but the fact that the conversation seems to be centered around dislocated—which to my knowledge isn't currently used for what I'm talking about in any language—makes me wonder if something's being lost in the conversation.

jonorthwash commented 7 years ago

we need to form a subgroup that can study all of the relevant constructions together and come up with a consistent and typologically well-motivated strategy for dealing with them

+1. Perhaps @nfeldbaum would like to be involved in this. I presume the discussion will be raised here again when it's time to organise that.

jnivre commented 7 years ago

Thanks! I might send a message to the ud list also.

jnivre commented 7 years ago

It is always hard to tell whether something is being lost in a conversation that you are engaged in yourself, but I hope not. I think I understand the range of phenomena, and I was only using "dislocated" as en example of how UD relations in general have to cover somewhat heterogeneous phenomena because they are coarse-grained. The other relation that is used for some of these phenomena is "expl", which is used in the analysis of clitic doubling as well as of inherent reflexives, which again show some affinities with the phenomena being discussed. The question is whether these relations (and others that are available) are sufficient to cover all the phenomena or whether we need additional relations, and this is what the group should try to find out by going through all phenomena in a systematic fashion.

dan-zeman commented 6 years ago

Maybe AUX + aux is not a bad solution in Khasi if it is the way of providing the Person and Number features (because they are not marked directly on the verb). It resembles some constructions in European languages where we have a non-finite main verb, combined with a finite auxiliary verb; the auxiliary provides the Person and Number features.

lingdoc commented 6 years ago

It sounds to me like the AUX + aux suggestion handles the case for now, though I suppose this is motivated by wanting to distinguish between markers required by the verb and those that are not? Is there a way to define this in terms of phrase structure rules within the particular language instead? Forgive my ignorance of how this all works within UD - I'm still getting my feet wet here.

In the case of Khasi, this is not actually a verbal auxiliary, but a referential pronoun with fixed syntactic placement, which is why I would recommend a language-specific solution.

For a bit more explanation on Khasi -- comparing with related languages (i.e. Pnar, War) clarifies that the preverbal pronoun in Khasi is a syntactic slot marker that identifies the Subject referent (via gender/number/person). The other languages in this group are largely Verb-initial (VSO), with immediately postverbal pronouns or full nouns that correspond to the Subject relation. Word order is the primary means of marking these relations for Khasian languages, and Standard Khasi is somewhat unusual within the group in requiring a preverbal subject (and marker) in transitive clauses. Below is the original Khasi construction above (corrected - la is the PAST marker [or possibly REALIS mood], not a pronoun) and its equivalents in the related languages for which I have data. The ACC marker is optional in both of the latter varieties, as it marks a semantic recipient rather than a strict case relation.

Khasi:

U   Blei u   la   thaw   ia  ka  bneng.
DET God  DET PAST create ACC DET heavens

Pnar:

da   thoo   u=blai  (ya) i=bneiñ
REAL create DET=God ACC  DET=heaven

War:

e    thia   u=pra   (ia) u=phlieng
REAL create DET=God ACC  DET=heaven

You might notice that the gender markers in Pnar/War are clitics, while in Khasi they seem to stand as separate words - my personal feeling is that they are clitics in Khasi as well (prosodically unstressed), but the orthographic tradition is to write them as separate words.

The particular grammaticalization in Khasi of the subject pronoun preceding the verb complex, from what I can tell, is quite closely related to 'topicalization/fronting', since you can get a similar construction in Pnar/War with the order of the pronoun and verb complex reversed from that of Khasi. In Pnar/War this is a marked order and is used for focus/topicalization. Pnar:

u=blai  da   thoo   u   (ya) i=bneiñ
DET=God REAL create DET ACC  DET=heaven

But the pronoun placement here (for u "3sg.Masc") is more a situation of reserving a particular place in the sentence (in the case of Khasi immediately before the verbal complex) for a grammatical argument (or its gender/number referent) understood as the Subject, which over time might eventually coalesce with the verbal complex but hasn't quite yet...

dan-zeman commented 6 years ago

So it seems that la indeed is AUX (it marks the past tense) and the question is what is the second u, right? If it is DET in u Blei, then it probably stays DET here. Consequently, I am no longer convinced that it should be attached as aux or a subtype thereof. An analysis with dislocated or expl is probably more in line with the rest of UD.

A language-specific solution is possible as long as the POS tag is taken from the universal set, and the relation is a subtype of a universal relation.

lingdoc commented 6 years ago

Yes, that's how I'd analyze it. The second u in Khasi is required by the verb, though, whether or not there's a dislocated (preverbal) topic, which I think is what the OP was about - whether the immediately preverbal u should be marked as a verbal dependent. I'm fine with leaving all pronouns as DET and then subclassifying the Khasi preverbal subject pronouns on a language-specific basis, maybe as a sort of VP or clausal clitic.