UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
266 stars 244 forks source link

Coordination in Japanese #356

Open kanayamah opened 7 years ago

kanayamah commented 7 years ago

As discussed in #236, the current left-head principle in coordination structure makes a lot of confusion in head-final language such as Japanese and Turkish, and actually there are lot of inconsistencies in Japanese-KTC. As a much better "technical solution", we are considering to change from

    1   A   NOUN    0   root
    2   および CONJ    1   cc
    3   B   NOUN    1   conj
    4   が   ADP 1   case

to

    1   A   NOUN    3   conj
    2   および CONJ    3   cc
    3   B   NOUN    0   root
    4   が   ADP 3   case

if the head direction can be defined in each language.

kanayamah commented 7 years ago

Also we want to change the PoS tag for a conjunctive marker "と". So far it is

    1   A   NOUN    0   root
    2   と   CONJ    1   cc
    3   B   NOUN    1   conj
    4   が   ADP 1   case

but syntactically "と" is quite close to ADP as well as other case markers, so it might be better to make

    1   A   NOUN    3   conj
    2   と   ADP 1   case
    3   B   NOUN    0   root
    4   が   ADP 3   case

if conj is allowed to appear without corresponding cc.

dan-zeman commented 7 years ago

Coordination "heads" have been heavily discussed during preparation of the upcoming version 2 UD guidelines. It has been proposed (http://universaldependencies.org/v2/coordination.html) that a language can opt for right-headed coordination. However, the subsequent discussion in https://github.com/UniversalDependencies/UD_v2/issues/3 inclines (although by no means unanimously) back to the all-left-headed rule.

dan-zeman commented 7 years ago

@kanayamah : Is the construction with と still coordination (paratactic)? Isn't it then more hypotactic, i.e. more like John with Mary than like John and Mary? Because if it is, then the analysis should be

    1   Mary    PROPN   3   nmod
    2   と/with  ADP 1   case
    3   John    PROPN   0   root
    4   が   ADP 3   case
kanayamah commented 7 years ago

@dan-zeman thank you for your comment. In most of "A と B", A and B are exchangeable (unlike "A に B"), so I think it is coordination which can be mapped to "A and B" in English. But considering noun-noun coordination can be regarded as a special case of nmod relationship, it is one of reasonable idea to say nmod and it can reduce intra-language inconsistency and further discussion in coordination structure, with a sacrifice in inter-language correspondence.

spyysalo commented 7 years ago

I think it would be very unfortunate to lose the inter-language correspondence for basic noun coordination. A と B is the "direct translation" of A and B (for nouns A and B) and we should seek to capture this in UD annotation.

Is the construction with と still coordination (paratactic)? Isn't it then more hypotactic, i.e. more like John with Mary than like John and Mary?

As @kanayamah writes, it is coordination (paratactic). If I were to encounter your (@dan-zeman) suggested UD analysis of Mary と John in a corpus, I would assume it to be an annotation error (with all due respect :-)).

I think creating UD Japanese annotation that suggests that Japanese lacks basic noun-noun coordination is a mistake. If no other solution is suitable, even a clearly documented deviation from the general UD guidelines stating that UD Japanese has right-headed coordination would be preferable.

jnivre commented 7 years ago

Thanks @spyysalo for bringing this issue to my attention. I would also strongly advise against not using "conj" for Japanese. It seems that we would be losing a very important and real parallelism with other languages if we do no use "conj". I also think the whole left- vs. right-headed issue rests on a misconception about the nature of coordination (which is not a dependency relation in the narrow sense). I would be happy to discuss this in depth in order to make my position clear. Maybe we can find some time to do this during COLING next week.

spyysalo commented 7 years ago

@jnivre : thanks! I think some more discussion of how coordination is understood in UD at http://universaldependencies.org/u/dep/conj.html might help, as this and related issues have been raised multiple times. Perhaps some materials from http://universaldependencies.org/v2/coordination.html could be used?

(I would be happy to join a discussion of the topic at Coling.)

jnivre commented 7 years ago

I think the current versions of the doc pages you point to are not very helpful in this respect. I hope I can find the time to improve them soon (but definitely not before COLING).

kanayamah commented 7 years ago

@spyysalo @jnivre Thank you for your comments. Surely we know the importance to annotate conj for coordination structure. The current version does not have conj to reduce confusion in structures, but the conversion to get conj is much more straightforward and the outcome is intuitive if right-head coordination is allowed, both for nominal and verbal coordination. I will join the discussion that Prof. Matsumoto arranged just after @jnivre's invited talk on Dec 13. Let us see you then!

dan-zeman commented 7 years ago

Agreed that we don't want to lose coordination in Japanese and looking forward to the discussion next week. (Don't get me wrong—I was not really proposing a different annotation of coordination. Not knowing Japanese, I was just asking whether this construction is coordination :-))

jnivre commented 7 years ago

Okay. It seems we all agree that Japanese has "conj" structures, which is good. Let us try to sort out the details when we meet. One thing to keep in mind for everyone is that UD sometimes forces you to do things that you are not used to in the interest of cross-linguistic comparability. At the same time, we of course do not want to impose an incorrect analysis on a language. That would be going to far in maximising parallelism. And exactly where to draw the line is often difficult.

I can recommend the following survey by Martin Haspelmath on coordination. There you will see, for example, that English and Japanese allow exactly the same cases of ellipsis in coordination, despite English being SVO and Japanese being SOV (Section 7).

http://email.eva.mpg.de/~haspelmt/coord.pdf

kanayamah commented 7 years ago

@jnivre, thank you for good material. As mentioned with Korean example (57b) in Martin Haspelmath's survey, it is difficult to draw a line between verb-verb conjunction and subordinate. I think this supports the feasibility of right-head structure in conjunction to make the annotation more consistent.

@dan-zeman, surely I know your intention.

spyysalo commented 7 years ago

Bump. I think it would be great to extend the documentation on coordination. As @jnivre writes:

I also think the whole left- vs. right-headed issue rests on a misconception about the nature of coordination (which is not a dependency relation in the narrow sense).

and

the current versions of the doc pages you point to are not very helpful in this respect.

I'd volunteer to draft something on this, but I don't think I'd be able to write on this as clearly as the topic deserves.

dan-zeman commented 6 years ago

@spyysalo : Feel free to draft! Others can add their thoughts and we can have a holistic documentation of coordination at the end. You may want to start a working group for coordination in the working groups section.

murawaki commented 5 years ago

I'm late to the party. I have read http://universaldependencies.org/docs/2015-08-23-uppsala/coordination.html, http://universaldependencies.org/udw18/PDFs/10_Paper.pdf, issues #189 and #236, and this issue. I also checked Haspelmath's paper mentioned above, but I think his newer paper is more comprehensive: https://www.researchgate.net/publication/40851992_Coordinating_constructions_an_overview Sorry if I miss something.

I am a supporter of the introduction of a language-specific directionality parameter for coordination. I would like to present several rationales for it, but for now, I focus on nominal coordination. What is relevant here is Section 5 of Haspelmath's 2004 paper (also Section 6.1 of Haspelmath (2000)). Japanese is a WITH-language, where the same marker 'と' is used to express both conjunctive and comitative.

John と Mary が 行く 。
John =and Mary =NOM go .
John and Mary go.
John が Mary と 行く 。
John =NOM Mary =with go .
John goes with Mary.
John が Mary と Anna と 行く 。
John =NOM Mary =and Anna =with go .
John goes with Mary and Anna.

Haspelmath (2000) suggests that WITH-languages have a wider geographical distribution than AND-languages like English.

Diachronically speaking, the comitative-to-conjunctive syntactic/semantic extension is a common path of grammaticalization. Synchronically, this maker may or may not be seen as having two different functions. In either case, it would be safer to drop the assumption that there is always a clear boundary between a normal dependency relation and coordination.

And this has practical implications for dependency parsing. A parser must disambiguate the syntactic function of the marker. As a comitative marker, it forms a usual head-final dependency structure. But when it is interpreted as a conjunctive coordinator, the all-left-headed rule leads to an entirely different structure and thus has a risk of catastrophic error propagation. We can mitigate the problem if we set right-headedness for the proposed directionality parameter.

Inputs from other WITH-languages would be helpful.

murawaki commented 5 years ago

The dependency label 'cc' is defined as "the relation between a conjunct and a preceding coordinating conjunction". http://universaldependencies.org/u/dep/cc.html However, the Japanese marker 'と', regardless of whether it expresses conjunctive or comitative, is an enclitic clearly attached to the host to the left, not to the right (i.e., ((A と) B), not (A (と B))). We can also say:

John と Mary と が 行く 。
John =and Mary =and =NOM go .
John and Mary go.

See also Section 2 of Haspelmath (2004).

dan-zeman commented 5 years ago

The coordinating conjunction used to be attached to the first conjunct in UD v1. The rule was changed in UD v2 and the conjunction is now attached to the immediately following conjunct, on the grounds that there is some evidence that they form a constituent. I think that this bit was decided with primarily European languages in mind (although the bigger question of the directionality of conj was extensively discussed for head-final languages).

I think it should be possible to say that a particular coordinator in a particular language behaves differently and forms a constituent with the immediately preceding conjunct. Then it should be attached to the preceding conjunct via the cc relation. It should be possible to make this decision independently of the question whether coordination has a head (although there seems to be a correlation between postpositional coordinators and languages where people argue for right-headed “coordination”).

sylvainkahane commented 5 years ago

@murawaki Thanks for a very clear presentation of the data. @dan-zeman I agree with your answer and I think we can go further. When analyzing A & B in a given language, the first question should be: do we have a phrase (A &) B or A (& B)? And then, considering that the CCONJ is on the second conjunct, we can decide on which direction must be conj.

So if we have (A &) B as in Japanese, we'll have:

conj(B,A)
cc(A,&)

And if we have A (& B) as in English we'll have:

conj(A,B)
cc(B,&)
murawaki commented 5 years ago

I have no objection to the decoupling proposal though I thought the rule for the coordinating conjunction was part of the the perceived cross-lingual parallelism, which in my opinion, goes against linguistic reality.

I move on to the next topic. The VP "coordination" in Japanese is interesting because it is where the limits of coordination blur again. The question is whether the VP "coordination" is a genuine coordination in the first place.

Kanayama et al. (2018) illustrated in Figure 6 what the dependency tree looked like if the verbal suffix -te (allomorph: -de) was interpreted as a marker for coordinate conjunction. Kanayama et al. (2018) and Noh et al. (2018) treated the Korean verbal suffix -ko in a similar manner. However, the conjunct headed by a -te verb can be seen as syntactically subordinate. In fact, traditional Japanese NLP maintains that the -te construction is an ordinary dependency relation.

Haspelmath (2004) is ambivalent about the coordinate/subordinate distinction (Sec. 11). But in his earlier paper on the converb (1995), he used the -te clause as an example where "its subordinate status is beyond doubt" (Sec. 3.4.1):

John wa booshi o nui-de, Mary ni aisatsu shi-ta.
John TOP hat ACC take.off-CONV Mary DAT greet do-PAST
John took off his hat and greeted Mary.

I changed romanization. In this example, the NP Mary ni can be moved to the left, making the VP Mary ni aisatsu shi-ta discontinuous.

John wa Mary ni booshi o nui-de aisatsu shi-ta.
John TOP Mary DAT hat ACC take.off-CONV greet do-PAST
John took off his hat and greeted Mary.

According to Haspelmath, the NP John wa "probably belongs to the superordinate clause" (Mary ni aisatsu shi-ta) while the subordinate clause (booshi o nui-de) has an implicit subject controlled by John (right-headed!). That's exactly how traditional Japanese NLP treats this construction.

kanayamah commented 5 years ago

Since the milestone of this issue has been set to v2.4, I created the right-head version of UD_Japanese-GSD data and put to dev-conj branch: https://github.com/UniversalDependencies/UD_Japanese-GSD/tree/dev_conj (please see the files with _conj)

Following the discussion above (by @murawaki @sylvainkahane @dan-zeman), conj are added with the strategies below:

In the UDW2018 workshop many people agreed this idea, and Japanese corpus should have coordination as @spyysalo desired. Also Korean and other languages can follow this strategy.

@dan-zeman, if you allow the right-head conjunction in the coming release, could you modify the validator? The validator can have language-dependent options.

dan-zeman commented 5 years ago

@kanayamah, I am sorry to disappoint you but I cannot change the guidelines on my own, and there cannot be major guideline changes between two 2.* releases. If that happens, it will have to be in UD v3 guidelines.

I set the milestone to 2.4 because I thought that maybe the issue of the attachment of the cc node (post from 2 Nov 2018) could qualify as a small amendment to the v2 guidelines, elaborating on something that had not been sufficiently specified in the documentation. I need some input from the core group on this. Once it is settled, I intend to re-set the milestone of this issue for "later", meaning that there is something to be considered for UD v3.

dseddah commented 5 years ago

Hi Dan and all, since the last UD workshop, it was my understanding that for verb final languages it would be allowed to have right-headed coordinate structures.

I'm not sure it makes sense for people to maintain 2 versions of their treebanks, one orthodox UD and one with the correct linguistic structures available from their website (Korean and Japonese). Allowing some sort of symmetries depending on the canonical word order of a given language feels right.

Best, Djamé

dan-zeman commented 5 years ago

Hi Djamé, I don't know where this understanding of yours stems from :-) maybe I missed something?

There is also the opposite argument: if the UD guidelines state that something always holds, then the users can rely on it; therefore it should really hold in the data. In any case I wouldn't think it would be a good idea to abruptly change it a week before the data freeze for the next release.

Right-headed coordination could be a "correct linguistic structure" only if conj was a linguistically motivated relation. But I think we have always been emphasizing that it is a technical relation that connects two nodes which are not dependent one on the other. That's what makes irrelevant the discussion whether it is better to attach A to B instead of B to A.

dseddah commented 5 years ago

Le 24 avr. 2019 à 15:19, Dan Zeman notifications@github.com a écrit :

Hi Djamé, I don't know where this understanding of yours stems from :-) maybe I missed something?

Seriously, it was during the question session of that paper "Coordinate Structures in Universal Dependencies for Head-final Languages », I thought everyone was in agreement that the current situation couldn’t/won't last long. Maybe I’m the one who missed the « nope, won’t fix it » crowd answer?

There is also the opposite argument: if the UD guidelines state that something always holds, then the users can rely on it; therefore it should really hold in the data. In any case I wouldn't think it would be a good idea to abruptly change it a week before the data freeze for the next release. the date argument is a solid one but as I said I thought the rules were changed already .

Right-headed coordination could be a "correct linguistic structure" only if conj was a linguistically motivated relation. But I think we have always been emphasizing that it is a technical relation that connects two nodes which are not dependent one on the other. That's what makes irrelevant the discussion whether it is better to attach A to B instead of B to A. i’m not sure about this argument Dan, every decisions we make in treebanking is about deciding what to attach to where and many of these decisions are tied to technical challenges, especially when it comes to coordination. Anyway, in that particular head final language case, I think the authors made their points pretty solid and we should acknowledge that.

Best, Djamé

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jnivre commented 5 years ago

There are good arguments on both sides and there is an ongoing discussion, but there is definitely not any consensus yet. More importantly, as Dan pointed out, major guideline changes do not happen as a result of workshop discussions. They are decided by the universal guidelines group and require a major version change (like the change from v1 to v2). Anything else would lead to complete chaos.

kanayamah commented 5 years ago

@dan-zeman @jnivre Thank you. I understand that it is not a small change so I follow your decision, and nice to hear that it will surely be discussed in the v3 design.

@dseddah thank you for your support!

murawaki commented 5 years ago

I think we have always been emphasizing that it is a technical relation that connects two nodes which are not dependent one on the other. That's what makes irrelevant the discussion whether it is better to attach A to B instead of B to A.

I am sorry to repeat myself but my proposal is to drop this oversimplified assumption before changing the guidelines. (1) In reality, the boundary between coordination and dependency is not always clear, and (2) coordination constructions (as we identify them) often share still-transparent diachronic sources with dependency constructions. That's why treating coordination constructions harmoniously with dependency constructions has practical benefits.