amir-zeldes / gum

Repository for the Georgetown University Multilayer Corpus (GUM)
https://gucorpling.org/gum/
Other
88 stars 50 forks source link

Coordinated auxes #107

Closed nschneid closed 1 year ago

nschneid commented 2 years ago

In investigating UniversalDependencies/UD_English-EWT#298 I found one sentence lacking proper nested structure:

amir-zeldes commented 2 years ago

Yes, I agree this is not right and should have conj, thanks - but what would you suggest doing with "not" in this case?

nschneid commented 2 years ago

advmod(can, not)

amir-zeldes commented 2 years ago

Yes, I had the same thought BUT:

And at this point my brain went into a maximum recursion depth error :|

Any thoughts?

nschneid commented 2 years ago

AFAICT the rules about promotion are not really designed to deal with function words having dependents (even coordination). I think the most natural solution is to coordinate the AUXes, and then treat the 2nd AUX as head of the negation. The other option (which some of the EWT sentences had before I changed it) treating the main verb as elided in the first instance was really awkward.

amir-zeldes commented 2 years ago

From a commonsense perspective I agree completely that that's the most sensible solution. However I would like to see some UD guideline formalize this, for example:

Coordinated auxiliaries should be attached from the first to the subsequent ones via conj. If more than one function word participates in a non-initial auxiliary (for example a negation of just the second auxiliary, or a cluster of coordinated auxiliaries), then the syntactically highest auxiliary in the non-initial cluster is taken to represent that entire cluster, is governed by conj and governs the remaining function words in its cluster using their usual deprels

Note that this explanation may not make sense for languages with postposed coordination AND postposed auxiliaries, in which the second AUX can very naturally behave this way just by assuming normal promotion.

Pinging @dan-zeman for your opinion on a UD policy for such cases.

dan-zeman commented 2 years ago

I think that the more complex cases should be solved as coordinate clauses with ellipsis, i.e., an auxiliary is promoted. And the more I am thinking about it, the more I am wondering whether we should do the same even with the simple coordination of auxiliaries, such as she could and should come early.

amir-zeldes commented 2 years ago

should be solved as coordinate clauses with ellipsis

Do you mean with enhanced dependencies and an empty node? But what would you do with the basic graph? Do you mean orphan(can2, not)?

dan-zeman commented 2 years ago

No, we do not use orphan when we promote an auxiliary to the head of clause. We also do not use empty nodes in such cases.

what can and cannot be done

nsubj(can-2, what) conj(can-2, done) cc(done, and) aux(done, can-4) advmod(done, not) aux(done, be)

amir-zeldes commented 2 years ago

nsubj(can-2, what)

OK, it looks like you're analyzing this like a question, right? Isn't this a free relative? If it's a question I would expect nsubj:pass (because we are asking what can be done), but if it's a free relative I guess it would be acl:relcl(what, can-2)?

In the free relative reading I think you are suggesting this:

image

Is that right? Thoughts on this @nschneid ? I understand the motivation to do this, but I find it dissatisfying that for the first "can" there is no trace of what the predicate actually is, not even in edeps, since no empty node is introduced. On the other hand, if we attach "not" to the second "can", we need to do something similar in saying "the predicate is the same as the conj parent", but that seems a little more intuitive to me (as @nschneid suggested). I would like to throw the orphan option into the mix too though, since we could say it's like this:

1   rules   rule    NOUN    _   _   2   nsubj   2:nsubj _
2   establish   establish   VERB    _   _   0   root    _   _
3   what    what    PRON    _   _   2   obj 2:obj|4.2:nsubj:pass|9:nsubj:pass   _
4   can can AUX _   _   9   aux 4.2:aux _
4.1 be  be  AUX _   _   _   _   4.2:aux:pass    _
4.2 done    do  VERB    _   _   _   _   3:acl:relcl _
5   and and CCONJ   _   _   6   cc  9:cc    _
6   can can AUX _   _   4   conj    9:aux   _
7   not not PART    _   _   6   orphan  9:advmod    _
8   be  be  AUX _   _   9   aux:pass    9:aux:pass  _
9   done    do  VERB    _   _   3   acl:relcl   4.2:conj:and    _

In this analysis, the edeps express the entire expanded argument structure with auxiliaries and negation, but in basic dependencies we would say "not" can't modify an auxiliary, so it's really an orphan caused by the ellipsis of the second "done", which is reconstructed in an empty node.

dan-zeman commented 2 years ago

you're analyzing this like a question, right?

Yes. I deliberately cited only part of the example to get rid of the free relative reading :-) And yes, it should have been nsubj:pass rather than just nsubj.

I am not apriori against using empty nodes for content predicates when auxiliaries are promoted, especially if we end up adding other possibilities for empty nodes. I am just saying it is not what UD does now. (And if that enhancement is introduced, then I think it should include all places where we promote auxiliaries now.)

nschneid commented 2 years ago

We are having the free relative vs. interrogative content clause discussion elsewhere. For this thread let's sidestep that and use a simpler sentence:

Option A is to treat coordinated material (the non-initial conjuncts) as self-contained so that the analysis is just

with a few edges/words added: conj(can, should) and advmod(should, not-6). This seems most natural to me. It basically promotes the second aux to head with respect to the second not. Since auxiliaries are also verbs it feels like not too much of a stretch.

Option B is to say that, really, not-6 is modifying the main verb rather than should, so instead of advmod it should be orphan(should, not-6), but otherwise the same as Option A.

Option C (which I think is what @dan-zeman meant) is to promote the first aux as head of the whole clause, so that both nots are proper advmods. But this has the effect that removing "and should not" changes the whole structure, because auxes are not normally heads.

Option D is to assume the verb is elided in the first instance: I can not \<eat3.1> and should not eat7 this whole pizza. Which is fine insofar as the Enhanced layer goes but results in Option C for the Basic dependencies, which is clunky.

Are those the 4 options or am I missing something?

amir-zeldes commented 2 years ago

I think that's right thank you both! What you say was exactly my problem: I agree @nschneid 's suggestion is sensible, but I also felt it is not what UD currently officially does, as @dan-zeman pointed out.

This is complicated enough that I don't see a clear winner - 'the blanket is too short to cover everything', as we say where I'm from... So we need to choose which part to cover and which one to mistreat. Shall we tackle this in some upcoming meeting?

dan-zeman commented 2 years ago

I would love to see examples from other languages, too. But they may be difficult to search for because we don't know how people currently analyze them (and they are probably quite rare).

amir-zeldes commented 2 years ago

Sure, same here. I could produce translations of this into a couple of languages where things work sufficiently similarly... But either way I would ideally like to have a live discussion, GH issues are a bit too cumbersome for this level of deliberation IMO.

nschneid commented 2 years ago

Yeah we should have a live discussion. One more point is that other kinds of function words can be coordinated: "Are you flying right into or right out of Chicago?" "Is this or that book the correct one?" So maybe we need a general policy.

dan-zeman commented 2 years ago

Kulturelle Ansichten müssen und sollen diskutiert werden. "Cultural views must and should be discussed." Schránka-1

de as evoluções de «downsizing» e «rightsizing» a que muitos sistemas de informação estiveram ou virão a estar sujeitos. "from the evolutions of «downsizing» and «rightsizing» to which many information systems have been or will be subject." Schránka-2

maar het is en blijft een race tegen het horloge. "but it is and remains a race against the watch." Schránka-3

K něčemu takovému bychom nuceni byli a nebyli. "We would be and would not be forced to do such a thing." Schránka-4

členy Evropské unie nejsme a patrně ani dlouho ještě nebudeme. "we are not members of the European Union and probably will not be for a long time." Schránka-5

 a byla by to bývala (nebo snad i bude) návštěva prvního významného státníka v oblasti "and it would have been (or perhaps will be) the visit of the first important statesman in the area" Schránka-6