UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

iobj/obj (with ccomp and xcomp) consistency in matrix verb annotations #55

Closed tzshi closed 10 months ago

tzshi commented 6 years ago

Is there some annotation guideline that I missed, or is this indeed an annotation consistency issue about the object of the following cases (ECM/control verbs)?

While most of the sentences annotate the object with obj, a few are annotated with iobj, such as:

  1. 15->16 is annotated with iobj

    # sent_id = email-enronsent30_01-0033
    # text = Juan communicated some numbers to me and when reviewing this request would like to ask you to consider the following:
    ...
    15  ask ask VERB    VB  VerbForm=Inf    13  xcomp   13:xcomp    _
    16  you you PRON    PRP Case=Acc|Person=2|PronType=Prs  15  iobj    15:iobj _
    17  to  to  PART    TO  _   18  mark    18:mark _
    18  consider    consider    VERB    VB  VerbForm=Inf    15  xcomp   15:xcomp    _
    19  the the DET DT  Definite=Def|PronType=Art   18  obj 18:obj  _
    20  following   follow  VERB    VBG VerbForm=Ger    19  amod    19:amod SpaceAfter=No
    21  :   :   PUNCT   :   _   2   punct   2:punct _
  2. 4->5 annotated with iobj

    # sent_id = email-enronsent27_01-0058
    # text = PS Your brother told me he went to 3 bowl games (when I found out that two of them were the galleryfurniture.com bowl and that one in Shreveport (I can't remember the name of it)) I realized he is a very, very sick college football fan.
    ...
    4   told    tell    VERB    VBD Mood=Ind|Tense=Past|VerbForm=Fin    0   root    0:root  _
    5   me  I   PRON    PRP Case=Acc|Number=Sing|Person=1|PronType=Prs  4   iobj    4:iobj  _
    6   he  he  PRON    PRP Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs  7   nsubj   7:nsubj _
    7   went    go  VERB    VBD Mood=Ind|Tense=Past|VerbForm=Fin    4   ccomp   4:ccomp _
    8   to  to  ADP IN  _   11  case    11:case _
    9   3   3   NUM CD  NumType=Card    11  nummod  11:nummod   _
    10  bowl    bowl    NOUN    NN  Number=Sing 11  compound    11:compound _
    11  games   game    NOUN    NNS Number=Plur 7   obl 7:obl   _
    ...

And the same time I can find (more) instances with verbs like tell and ask where the objects are annotated with obj.

There are more of these kinds, and can be found through Grew, using the following pattern:

pattern { 
    N1 [cat="VERB"];
    N1 -[iobj]-> N3;
    N1 -[xcomp]-> N2;
    N1 << N3;
    N3 << N2;
}

iobj X ccomp gives 43 occurrences, obj X ccomp give 138, iobj X xcomp give 20, obj X xcomp gives 903.

nschneid commented 6 years ago

My understanding is that the 2 examples you highlight are correct: iobj should be used for the non-case-marked recipient of verbs like ask and tell, even if there's no obj in the sentence (there may be a clausal complement).

Looking at a few obj + xcomp examples, several are verbs like make, let, and keep where the xcomp is a secondary predicate. I believe these are correct.

Looking at a few obj + ccomp examples, several are instances of ask, tell, etc. These indeed look like errors: I believe they should be changed to iobj. But there are some other verbs used in constructions that also match this pattern where obj looks correct.

jnivre commented 6 years ago

I know that the guidelines say that iobj should be used in these cases, but I honestly now think that this was a mistake. All the usual tests indicate that these are direct objects, and their treatment as indirect objects also complicate the analysis of control for the enhanced dependencies. In fact, the original source of the problem is the choice to treat "her" as the indirect object in "he gave her a book". This is essentially a semantic analysis, identifying the indirect object with the recipient role, a type of analysis that UD otherwise rejects. All the syntactic criteria again point to "her" being the direct object and "a book" being what typologists call a secondary object (not an indirect object). UD probably does not want to introduce a special relation for secondary objects, but with hindsight it would have been much better to treat "her" as obj and "a book" as iobj, because it would not have propagated errors to other constructions. I am not sure whether this is something that can be considered as a "correction" of v2 guidelines or if we have to wait for v3.

dan-zeman commented 6 years ago

It is English-specific, therefore I think it would not be a change of the universal v2 guidelines and it can be corrected now.

But is "a book" really "less core" than "her" in "he gave her a book"? Aren't both the objects "equally core"? If they are, shouldn't both be labeled obj?

jnivre commented 6 years ago

In most dialects of English, only "her" can be passivised, for example:

she was given a book *a book was given her

Compare this to:

a book was given to her *she was given a book to

So it seems that the two "double object" constructions are really quite similar to alternations like "load the truck with hay" vs. "load hay on the truck", where only one thing can be the direct object at a time.

sylvainkahane commented 6 years ago

I agree with Joakim. Same thing for relative clauses I think:

the girl I gave the book *the book I gave her

This analysis was defended in a Bresnan 1981's paper.

amir-zeldes commented 6 years ago

I don't think we can completely take the semantics out of it, or if we do, we might need to revise a lot of things to be more 'surfacy'. For example xcomp for verbs of becoming, which in stronger cased Germanic languages take what looks like a nominative:

In this example, I can't see an overt difference to a normal object, notwithstanding the non-passivizability. If that were to be an argument, we could switch to transitive causative xcomp with 'they made him a teacher' (and 'he was made a teacher'). If we don't use semantics, why should that be xcomp and not double obj?

To be clear, I'm not advocating giving up xcomp here, and I do understand the semantic motivation, as well as the analogy to other Germanic languages:

On the other hand, I think the predicate of 'become' is very much a core argument, but is not the same thing as obj, so we should allow some degree of semantic reasoning in deciding what we consider 'the same'.

But maybe a better argument not to make recipients obj is that we generally have this information in the English treebanks, so deleting it would be a big loss! I think it can also be useful when comparing with languages that have more overt indirect object marking.

sylvainkahane commented 6 years ago

Dative shift is a regular alternation/redistribution. It seems reasonnable to use the same strategy as for passive or causative. In I gave her a book, we want to say that her is a true object and that it corresponds to the dative of the alternative construction (I gave a book to her). So a label such as obj:datshift could be an option.

sylvainkahane commented 6 years ago

I would like to react to my own proposition. Such a proposition presupposes that the we have identified a base construction among the two constructions in alternation and that we can say that one is a redistribution of the other. I'm not certain that the notion of base construction is correct and that a base construction can be identified. For the active and passive voices it is quite clear because one construction is simpler and more frequent than the other, but for the dative shift I have no intuition (maybe because I'm not a native speaker). Of course if we consider that I gave her a book is the base construction, the annotation changes. Maybe we should adopt a neutral annotation that does not presuppose any direction in the alternation.

amir-zeldes commented 6 years ago

I'm not sure I like the idea of 'base construction'... A lot of constructionist and psycholinguistic work has shown that that concept is really tied to specific prototypical cases, whereas in other cases the supposedly basic variant is not basic at all. A good example in English is X is based on Y, which looks like a passive of Y bases on X. Although the latter is possible, the former is much more likely, and it would seem to be stored as the 'normal' form of that construction for speakers.

But linguistic evidence aside, I would like to avoid verbs having two things called obj without an ability to distinguish which one corresponds to which argument structure slot. I realize we already do this when there are multiple obl, but those can often be disambiguated by the preposition they govern via case (or in an enhanced representation). For me, iobj currently does the job nicely, so I don't feel a need to change it. But maybe that's just related to the applications I use the data for...

jnivre commented 6 years ago

I don't think you need to assume a base construction to argue that the recipient should be obj in the double object construction. I also want to avoid having two obj, so the proposal would be to swap obj and iobj in the double object construction, because this is more consistent with the overall UD philosophy and makes the right prediction about things like passivisation. It also eliminates the anomaly of having iobj without obj in sentences like "she told him that ...", which otherwise appears to be an ad hoc exception.

amir-zeldes commented 6 years ago

I agree the base construction discussion can be put the side, but I'm not so happy about having the recipient be the obj, because in a transfer of possession verb without a recipient, the theme would again be the object:

I gave everything! obj(gave,everything)

This means that we can no longer rely on theme=obj, recipient=iobj, which is exactly the distinction that I care about for most applications. I'm sure others may see this differently, but from my point of view, this would be breaking something that doesn't need fixing :|

nschneid commented 6 years ago

To echo @amir-zeldes, I think the current policy is straightforward to apply because you have parallelism in what argument iobj applies to:

jnivre commented 6 years ago

Yes, this is a typical conflict between clauses 1-2 (sound for linguistic analysis and typology) and clauses 5-6 (understandable to non-linguists and useful for downstream NLP) in Manning's Law. I happily admit that I have a bias for the former. :)

jnivre commented 6 years ago

On a more serious note, the problem with assuming a one-to-one mapping between grammatical relations and thematic roles is that it will fail in many other cases. For example:

(1) they loaded the truck with hay (2) they loaded hay on the track

In (1) obj maps to goal or location, in (2) it maps to theme.

(3) the window broke (4) he broke the window

In (3) nsubj maps to theme, in (4) it maps to agent.

If you really want to derive a semantic representation, you have to do linking properly. Assuming a consistent mapping is just wishful thinking, and therefore we might as well do proper syntax instead. :)

gossebouma commented 6 years ago

I was going over the Dutch data for cases with obj/iobj and ccomp/xcomp and it turns out we have quite a few verbs where both annotations occur. The criterion I think is whether the NP argument can become the subject in a passive or not. So in most cases, we should be able to decide between obj and iobj per predicate. However, there are also a few cases where the data goes both ways, the verb 'vragen' (to ask) being a case in point. We find both

Ik (nom) werd gevraagd om te komen I was asked to com Mij (non-nom) werd gevraagd te komen Aan (to) mij werd gevraagd te komen

and similar situations where the nominal argument does or does not agree with the finite auxiliary.

Grammar purists tend to point out that only the non-nominative/ non-agreeing cases are correct here, but clearly actual usage does not always obey this rule.

sylvainkahane commented 6 years ago

@jnivre It is also a conflict between syntax and semantics. As you recalled, a purely syntactic annotation would have decided to encode the complement that can be passived and extracted as the obj.

But as remarked by @amir-zeldes, if we do that we lost the link between the two possible constructions of give. So the questions are:

• Do we want to keep this link? Is syntax concerned by this link? Or is it essentially semantic and should it be kept at the enhanced level?

• If we want to keep it, how can we proceed?

For the active-passive alternation, UD scheme has decided to keep the link between the two constructions. The way it is done presupposes that the active construction is the base construction or at least the default construction (which is reasonnable for many languages).

dseddah commented 6 years ago

Hi all,

one possible solution could be to encode both the canonical (the "deep" structure) and the final (the surface) realizations as it was proposed, following many other works, in our deep syntax proposals (Candito et al, 2014, [1] for the native scheme, Candito et al (2017) for the Enhanced-like UD one [2], see [3] for Marie's depling slides).

Sylvain@ I don't think the enhanced-* scheme focuses on Semantic, from what I understood (and discussions are still ongoing anyway), it's more about having complete syntactic structures, namely all core argument relations being represented by an actual edge. Details varie of course :)

[1] http://www.lrec-conf.org/proceedings/lrec2014/pdf/494_Paper.pdf [2] http://aclweb.org/anthology/W/W17/W17-6507.pdf [3] http://www.linguist.univ-paris-diderot.fr/~mcandito/Publications/depling17-slides.pdf

amir-zeldes commented 6 years ago

@jnivre and @sylvainkahane I think it is no coincidence that distinct labels have emerged for both passives and ditransitives in particular. These are precisely the constructions which in English do not have overt adpositional markers, and beyond English, I think this is typologically frequently(ish?) the case as well.

The reason why the spray/load alternations are less worrisome to concerns such as Manning's clauses 5-6 is IMO the fact that the combination of predicate lemma and preposition can nicely disambiguate which argument is which. For true double objects, and especially if you have word order variation (e.g. 'give me it' next to, in some varieties/languages, 'give it me'), the label becomes rather crucial.

Since what is being discussed, at least for English, is giving up a useful existing distinction, I feel obliged to object. On the other hand if we're just thinking of renaming things (e.g. using obj:iobj or something), then that is less crucial of course.

dan-zeman commented 6 years ago

@amir-zeldes Yes, I think it should be about renaming things. I think it would be more appropriate to use a subtype like obj:rcpt, that overtly admits that this is primarily about the semantic role. (In other languages, it will be much clearer that the theme is less core than the recipient, so one may again ask what exactly :iobj means.) I agree with you that useful information should not be lost. This is why we now use obl:arg in Czech because we do not want to lose usefule distinction between arguments and adjuncts, which is not supported at the universal level of UD relations.

amir-zeldes commented 6 years ago

@dan-zeman Sorry for the slow reply, back from NAACL now: yes, I understand, but if it's really just renaming, and we do recommend for languages to make this distinction, I'd just as soon not rename it and stay with iobj. I think we should prioritize stability if possible and only rename things if we really have to.

It also sounds like we need an in person/skype meeting to really work out the oblique/adverbial clause issue. Maybe in conjunction with UDW, if it's not too late for everyone?

dan-zeman commented 6 years ago

Better late than never :-) UDW should work.

nschneid commented 1 year ago

Thanks!

(Note: changes haven't propagated to the Grew-match server yet. Sometimes takes about an hour.)

amir-zeldes commented 1 year ago

changes haven't propagated

OK, note I only changed the split up files so far, so the big files are unchanged

precision errors due to verbs like "(re)assure", "advise", and "inform"

I included those verbs based on the ccomp variant:

Is that wrong?

Do the above changes handle control cases where the object is sister to xcomp

Since recall is based on a parser's predictions, it's easy to believe many such cases would be missed. If we want to include those though, where in practice there is only a clausal motivation for the ditransitive reading, then I think it should definitely apply to inform/advise etc. (and some can appear in both constructions, e.g. I advise you to go)

nschneid commented 1 year ago

Oh you're right—I forgot that we are including verbs that license two objects OR object+ccomp.

How about a Depedit rule: obj(Y=tell/ask/..., X) & xcomp(Y, Z) -> iobj(Y,X) & E:iobj(Y,X) & E:nsubj:xsubj(Z, X)? The list of relevant verbs can be constructed from https://universal.grew.fr/?custom=644aaaed97bac.

(I was going to say that a different enhanced dependency is needed for "promise X to Y" as "promise" is a subject control verb, but I don't see this anywhere in the data!)

amir-zeldes commented 1 year ago

This all sounds good, but I'm confused about the edeps - I already did E:iobj in the commit, and I think E:nsubj:xsubj should already be in the data, since those edges already applied when it was plain obj. Or am I missing something?

nschneid commented 1 year ago

You only did this for objects that the parser labeled as iobj, right? I'm suggesting to apply the rule to catch instances the parser may have missed.

You're right that E:nsubj:xsubj is there for current obj+xcomp ~except for a few cases due to relative clauses~ (edit: these are correct—the obj is the relative pronoun and the xsubj is the antecedent). There are 20 instances with iobj where the E:nsubj:xsubj is missing.

(BTW, build.py will produce the 3 main .conllu files from source docs. I just pushed.)

nschneid commented 1 year ago

Of the verbs that license iobj+ccomp:

TODO: others, like "cost", that license iobj+obj or just iobj.

nschneid commented 1 year ago

I must admit I'm having some qualms about iobj for "allow" and "permit". While the double object construction is possible ("I will allow/permit you 3 cookies") it is rare, and often the verb is sufficiently abstract that the iobj would be a nonvolitional entity where no possession is implied ("These measures will allow the economy to grow at a healthy rate / ?These measures will allow the economy a healthy rate of growth").

nschneid commented 1 year ago

I am guessing raising should not trigger iobj in cases like:

These are just obj+xcomp right (even though the verbs do have a sense with iobj)?

nschneid commented 1 year ago

@amir-zeldes thoughts on "allow" and "permit" (see above)?

amir-zeldes commented 1 year ago

? I could have sworn I commented on allow somewhere but can't see the comment now... Yes, I'm on board with allow etc., because of "allow/permit me the honor of..."

nschneid commented 1 year ago

So TBC, "allow/permit X to Y" should always be iobj? In GUM it's currently obj half the time.

amir-zeldes commented 1 year ago

It definitely shouldn't be half and half... I will fix it one way or the other, but remind me - are we definitely sure we want this for xcomp? At first I thought maybe this construction should have its own status and be left as always obj, because it never alternates with a prepositional dative, and it is not considered a violation of double object to have obj + xcomp.

On the other hand, I see the case for making the same distinction here as with ccomp, and in "ask him to do" we have the same lexical entry for "ask" as in "ask him that he go ...". What's more, in dative-marking languages we do see the case distinction, so cross-linguistically this would be nice and consistent:

So I guess in sum I would say yeah, I would be OK with using iobj for allow/permit in English based on the alternation behavior with non-infinitival complements. What do you think?

nschneid commented 1 year ago

I think the underlying principle is that removing a complement (whether obj, ccomp, or xcomp) should not change the deprel of the remaining object if the meaning doesn't change. The test for iobj is whether the verb could combine it with obj or ccomp. Whether an xcomp could be present is irrelevant to the obj vs. iobj distinction. (Otherwise, we would end up with "ask him/iobj (a question/obj)" but "ask him/obj to leave/xcomp", as you point out, so the deprel for the first object would not be invariant to dropping the xcomp.)

The reason this policy feels like a stretch for "allow" and "permit" is that these USUALLY do not occur with two objects or object+ccomp. They feel like raising verbs: "we allowed him to leave" = allow(we, leave(he)) as opposed to the control interpretation allow(we, he, leave(he)). But they have an infrequent double object usage: "we'll allow him this indulgence" = allow(we, he, indulgence).

I think we need an exception for raising interpretations anyway, due to cases like

  • newly released papers, showing him complicit in the airliner bombing
    • has nothing to do with showing something to him

But I'm not sure that allows ellipsis of the xcomp:

amir-zeldes commented 1 year ago

Hm, OK - I think the raising argument doesn't hold for allow/permit on a formal level (you don't get the actual 'raising' behavior with expletives like with "seem" or "happen"), so I guess our guidelines force our hand here. If you really don't like it on permit/allow, then I think the only way to exempt them is by arguing that the ditransitive variants are archaic, and maintain a list of exempt verbs that these should go on. These are edge cases so I don't care too deeply which side they land on, but we should be consistent, I can implement it either way.

nschneid commented 1 year ago

I wouldn't call them archaic. "I'll allow you three wishes" is perfectly fine.

I suppose we should just go with iobj, even though semantically it can stretch beyond typical characteristics like animacy when coupled with an infinitival xcomp. (It can even be an event: "We'll allow the meeting to be scheduled on Saturday.")

amir-zeldes commented 1 year ago

We'll allow the meeting to be scheduled on Saturday

That's completely fine by me! None of this is meant to capture semantic classes IMO - just the opposite, the corpus can now allow us to find non-person entities which function as indirect objects. There are plenty of examples in the data, also with 'give':

So I'll go with iobj for allow/permit then.

amir-zeldes commented 1 year ago

I guess this applies to cause + xcomp too, so these should be fixed:

https://universal.grew.fr/?custom=644ff8215ebdc

nschneid commented 1 year ago

Although I guess we can't assign all of them iobj because it's a situation like "tell"—the sole object could be either obj or iobj:

nschneid commented 1 year ago

With allow/prevent/cause and one object, should the criterion be that affected entities are iobj, and everything else is obj?

These are not animate, but they are affectees, and could occur in a double object like "cause the dresser damage". So I guess iobj. But in "I'll allow three wishes", "three wishes" is an event.

It could be ambiguous whether an entity is an affectee of causing/allowing or not:

amir-zeldes commented 1 year ago

Although I guess we can't assign all of them iobj because it's a situation like "tell"—the sole object could be either obj or iobj

Absolutely, I'm running a script on GUM based on entity type and confirming changes one by one

should the criterion be that affected entities are iobj

I think you can use that as a first heuristic, but no, I don't think that's the criterion. The real criterion is just 'which slot does it occupy in the ditransitive version':

onversely "a change in the climate caused clouding to occur in the sky" is obj, under the same interpretation.

It could be ambiguous whether an entity is an affectee of causing/allowing or not

Right, and again I would expect them to show up in the corresponding alternation slots in paraphrases.

nschneid commented 1 year ago

Final EWT queries:

nschneid commented 10 months ago