amrisi / amr-guidelines

250 stars 87 forks source link

Preliminary proposal for MWE framing conventions #152

Open timjogorman opened 9 years ago

timjogorman commented 9 years ago

Hey all! We've been talking at CU in an effort to figure out how we want to frame and annotate these new MWP rolesets, and here's an initial proposal for how we might want to handle them.

Short Version

There are two questions in this regard -- the first is whether something like "shed light on" or "take advantage of" should be decomposed into separate bits like "shed" and "light" or to be single atomic unit like "shed-light", and the second question is whether, assuming we want a single atomic unit, we want it to be "shed-light" or "illuminate-01". I'm assuming that most of us default to having a single roleset for these, and that we just need to address the second issue.

Why we think we should do separate rolesets + synonym links

This is this question of whether to use "get-kicks-out-of-01" or to directly annotate it as something like "enjoy-01". We actually talked about this by email many many months ago, and Nianwen Xue had a great comment I'd like to repeat:

I worry about mixing up the AMR business and the WordNet business. If we start to link frames because they are synonymous, we're essentially doing synsets, and we know that is a pretty open-ended enterprise. We also lose some transparency in the alignment between word tokens and AMR concepts. Wrong alignments have been a problem for AMR parsing.

One possible approach is to create frames for the MWEs such as 'take-advantage', and link them later as needed. The linking will be dictated by a different set of considerations.

It's quite possible for us to add a linking field to rolesets (for now, just MWPs), where we express whether (and to what extent) it is synonymous with another sense, like:

type-of-link="synonymy" to="enjoy-01" sense-match="identical" role-match="identical"

One could then swap these MWEs out when doing things like MT, where doing that mapping is more useful. It would also allow us to quickly pump out MWE frames, and slowly map to synonymous frames as we go, allow modification if we find errors or a better mapping, and appear much more transparent. Starting down the road of adding a linkages for synonymy (and potential other links, like that between blacken-01 and black-04) could give us access to much more generalization in the future, without the dangers inherent in doing a synset-type approach.

Modifiers

Consider phrases like:

These all share the characteristic of not only modifying an MWE, but of really modifying a part of the underlying semantic structure. "turn our terrifying swords into plowshares" cannot simply be AMRed as following:

turn-swords-into-plowshares-01
   :arg0 we
   :arg1-of terrify-01

The thing that's terrifying is not the event, but the weapons or military structure that "turn swords into plowshares" implies we are dismantling.

Consider this phrase "hiring her kills two diversity birds with one stone". What "diversity" is doing here, syntactically, is modifying "birds" and what "birds" really means. To the extent that it means "accomplish two goals with a single action", it's simultaneously modifying "bird" and "goal". In a more direct, AMR-style analysis, one could say that there are two decompositions below, and that "diversity" modified index "B" in both:

literal:

  (a / kill-01
     :arg1 (b / bird :quant 2)
     :arg2 (c / stone :quant 1)

abstract:

  (a / achieve-01
     :arg1 (b / goal :quant 2) 
     :manner (c / act-06 :quant 1)

If there were an easy way for us to do that in AMR and Propbank, we would be pushing for that. The closest approach would be to actually decompose these MWEs, which we assume would be perceived as overkill. However, almost all of what's scary about this problem is that one (A) needs to differentiate these weird cases from instances where it really is ok to modify the entire roleset, and (B) one needs to actually know the kind of mapping shown above. We think we need to differentiate these from normal roles anyways (as these, by definition, are semantically wrong when applied to the roleset without decomposition), which can be done with a simple role:

"Kill two birds with one stone"

(k / kill-two-birds-with-one-stone-01)

"Kill two diversity birds with one stone"

(k / kill-two-birds-with-one-stone-01
   : mwe-inner-structure (b / bird : mod (d / diversity))

"we turned our swords into plowshares

(t / turn-swords-into-plowshares-01
    : arg0 (w / we))

"we turned our terrible swords into plowshares

(t / turn-swords-into-plowshares-01
    : arg0 (w / we)
    : mwe-inner-structure (s / sword :mod (t2/ terrible)))

We also need to also define, for Propbank, some core information about the substructure anyways, so that we can find them and frame them, information like:

id="A" token="spill" arg="" pos="VB*" dep="" id="B" token="the" arg="" pos="DT" dep="det/C" id="C" token="beans" arg="ARG1" pos="NNS" dep="dobj/A"

(This could let us handle the non-semantic variation we see: we don't want twelve different aliases for beat/turn/hammer swords into/to plowshares/ploughshares)

Since we need to add these "slots" anyways to help prep Propbank tasks, we could just add these mappings alongside them, allowing us to have a nice, clear story for how our simple treatment could be combined with the Propbank rolesets for a rich MWE representation.

form= 'literal'
(A / spill-01
      :ARG1 (C / bean))

form='semantic'
(A / reveal-01
      :ARG1 (C / secret))

id="A" token="spill" arg="" pos="VB*" dep="" id="B" token="the" arg="" pos="DT" dep="det/C" id="C" token="beans" arg="1" pos="NNS" dep="dobj/A"

For Propbank we are tentatively planing to actually link words up to the inner structure, but even if we didn't do anything more for AMR, I think this would be a simple powerful treatment for us to do, without adding much work or complexity for the AMR annotators.

uhermjakob commented 9 years ago

we think we should do separate rolesets + synonym links

Thanks, Tim. I strongly agree. Many such concept pairs are not exact synonyms, often just entailments (e.g. 'pull strings' and 'influence').

we should just have a special role like ":mwe-inner-structure"

I think it's ok to use such a special role, but it's a bit of a crutch, just like our :prep-X roles. Many MWE frames will benefit from semantic arguments beyond the syntactic scope of "kill".

Example:

kill-two-birds-with-one-stone-05: achieve two (or more) goals with a single action
  ARG0: achiever
  ARG1: goals achieved
  ARG2: single action that achieves goals

"Biking to work kills two birds with one stone: it saves money and will help to lose weight."

(k / kill-two-birds-with-one-stone-05
  :ARG1 (a / and
          :op1 (s / save-01
                 :ARG0 b
                 :ARG1 (m / money))
          :op2 (h / help-01
                 :ARG0 b
                 :ARG1 (l / lose-01
                         :ARG1 (w2 / weight))))
  :ARG2 (b / bike-01
          :destination (w / work-01)))

"Hiring her would kill two diversity birds with one stone."

(k / kill-two-birds-with-one-stone-05
  :ARG1 (d / diversity)
  :ARG2 (h / hire-01
          :ARG1 (s / she)))

"John killed three birds with one stone."

(k / kill-two-birds-with-one-stone-05
  :ARG0 (p / person :name (n / name :op1 "John"))
  :ARG1 (t / thing :quant 3))

As Martha pointed out, some of these arguments often only show up in other sentences (such as the :ARG1 of kill-two-birds-with-one-stone-05).

Also, some of these semantic arguments remind me of what UCo frame builders have called 'cognate object' in existing frames. For MWEs such as turn-swords-into-plowshares, there could be several of those 'cognate objects' (e.g. one for 'swords' and one for 'plowshares').

Practically, for many MWEs, some of these potential semantic arguments will be rare and fairly unforeseeable, so as Kevin already suggested, we probably shouldn't worry too much about those cases in advance and just let the annotators use :mwe-inner-structure.

Then, if we later find a real need for additional roleset arguments, based on patterns of real examples, we can always add them later, just as we did for defend-01, for which we eventually upgraded from (defend-01 :prep-against enemy) to (defend-01 :ARG3 enemy).

nschneid commented 9 years ago

Interesting discussion. It seems to me that all of these examples involve idiomatic uses of metaphor. @timjogorman's "literal" vs. "abstract"/"semantic" is fairly close to what metaphor people call "source" vs. "target" domains. Perhaps my Berkeley roots are getting the better of me, but the :mwe-inner-structure approach looks like a hacky way of saying that these idioms are decomposable (in the sense of Nunberg et al. 1994), and therefore it is possible to augment pieces of the metaphor's source domain with modifiers from the target domain.

Instead of creating a new roleset for each of these decomposable idioms, would it be more in the spirit of AMR to use a shallower representation with a flag like :metaphor +? It could go on the main event only, to signal that a metaphor is involved but without specifying exactly which concepts are metaphoric—e.g., to adapt @ulfulf's example:

"Hiring her would kill two diversity birds with one stone."

(k / kill-01 :metaphor +
  :ARG1 (b / bird :mod (d / diversity))
  :ARG0 (h / hire-01
          :ARG1 (s / she)))

Or the flag could go on all the metaphoric (source domain) concepts, e.g.:

(k / kill-01 :metaphor +
  :ARG1 (b / bird :metaphor + :mod (d / diversity))
  :ARG0 (h / hire-01
          :ARG1 (s / she)))

And to make it easier for annotators, we could have the tool replace aliases like kill-two-birds-with-one-stone-05 with the canonical expansion of the idiom.

To be clear, I'm not proposing this for all MWPs/idioms—just the decomposable idioms. ("Kick the bucket" is fixed, i.e. non-decomposable.) I'm agnostic on whether :metaphor + would be useful for expressions other than idioms.

timjogorman commented 9 years ago

I think it's ok to use such a special role, but it's a bit of a crutch, just like our :prep-X roles. Many MWE frames will benefit from semantic arguments beyond the syntactic scope of "kill".

I very much agree with the idea that whenever we can avoid this by using the semantic roles of that MWE, we should do that. Personally I'd be fine with annotators doing the suggested trick for "kill three birds with one stone" too -- adding a "thing :quant 3" -- when something is modifying a real semantic role of the roleset. I think the only time we should really need the mwe-inner-structure permanently should be the very rare moments when actual metaphorical mapping work is necessary to understand the semantics in question.

nathan: Perhaps my Berkeley roots are getting the better of me, but the :mwe-inner-structure approach looks like a hacky way of saying that these idioms are decomposable

I'd quibble with that -- this is really marks that the idiom must be decomposed in order to understand/represent it, rather than the broader decomposablity idea of one could discern a possible metaphorical derivation. So while this is mainly proposed for practical purposes, it is actually capturing what a lot of people might consider to be the linguistically interesting part of idioms-- distinguishing conventionalized usages from usages that clearly require you to reference the metaphorical mapping.

Instead of creating a new roleset for each of these decomposable idioms, would it be more in the spirit of AMR to use a shallower representation with a flag like :metaphor +?

I couldn't find any relevant "Zen of Amr" proverbs on this -- maybe we need one :). Personally, my instincts about "the spirit of AMR" is that if many tokens constitute a single unit of meaning, AMR would want them all to be combined into a single concept.

For me, having a more normal roleset-based analysis of these seems much more practically usable, and much less scary for annotators (how to they know what to give :metaphor + to? It seems like a slippery slope to labeling everything that's metaphorical). If we focus on making these part of the lexicon, it lets us actually write out a source->target mapping, rather than rather than repeating the same shallow source decomposition every time.

nschneid commented 9 years ago

If we focus on making these part of the lexicon, it lets us actually write out a source->target mapping, rather than rather than repeating the same shallow source decomposition every time.

Well, we could have both—by essentially turning the lexicon into an AMR constructicon, where prefabricated chunks of AMR are lexicalized and subject to manipulation/modification as necessary. I agree that for idioms we ideally wouldn't want annotators to have to enter the decomposed form from scratch.

if many tokens constitute a single unit of meaning, AMR would want them all to be combined into a single concept.

Yes, but in my view, internal modification in these idioms is precisely what shows us that the normal "single unit of meaning" analysis is inadequate. It's possible that speakers understand the source and target domains of the metaphor simultaneously in such instances.

We could have a policy that AMR should always strive to represent the target domain. But I'm pretty sure we don't do that with highly productive metaphors, such as "illuminate the issue": we do something superficial like (i / illuminate-01 :ARG1 (i2 / issue)), right? So why should the modified idiom "shed bright light on the issue" be any different?

Is the problem that PropBank roleset glosses can include metaphoric meanings (illuminate-01 and shed-03 could be defined to mean 'cast light on or clarify'), but that doesn't help us for the metaphoric noun (light) that needs to be modified?

timjogorman commented 9 years ago

Hi all! I've posted a number of MWP frames to A Google drive folder. For completeness, and for Ulf to start considering actual implementation, these are the actual proposed Propbank frame files, so you'll need to scroll to the bottom to see the new senses.

Since those xml files are a bit opaque, I've put a few of these into the format I think we'd want to see in AMR. We've added links when possible to synonymous non-MWP forms, but I'm not sure how or whether we would want to represent those to annotators (almost all of the links are approximate at best, although some on the call might be able to think of closer matches)

Rise to the occasion

rise-to-the-occasion-03 -- "perform well or increase ability, in response to a special event

  • ARG0: person increasing performance, riser
  • ARG1: difficulty, special event

Aliases: rise-to-the-occasion(m) Synonyms: overcome-01(loose match)

If Japan rises to the occasion Japan will never be beaten even in terms of military power. (all examples are changed to include MWP rolesets but not otherwise corrected)

(b / beat-03 :polarity -
      :ARG1 (c / country :wiki "Japan" :name (n / name :op1 "Japan"))
      :ARG2 (p / power
            :mod (m / military)
            :mod (e2 / even))
      :time (e / ever)
      :condition (r2 / rise-to-the-occasion-03
            :ARG0 c))
Jump on the bandwagon

jump-on-the-bandwagon-09 -- "join an activity or group because of its popularity

  • ARG0: person jumping on the bandwagon
  • ARG1: popular thing joined
  • ARG2: action done to get on the bandwagon

Aliases: jump-on-the-bandwagon(m)

And whether they're genuinely thinking for the better of this country or they just house people who just want a fight, that have the option to jump on the bandwagon of UK politics.

(m / multi-sentence
      :snt1 (a / and
            :op2 (o / or
                  :op1 (t / think-01
                        :ARG0 (t2 / they)
                        :manner (g / genuine)
                        :prep-for (b / better-01
                              :ARG1 (c / country
                                    :mod (t3 / this))))
                  :op2 (h / house-01
                        :ARG0 t2
                        :ARG1 (p / person
                              :ARG0-of (w2 / want-01
                                    :ARG1 (f / fight-01)
                                    :mod j)
                              :ARG0-of (h2 / have-03
                                    :ARG1 (o2 / option
                                          :prep-to (j3 / jump-on-the-bandwagon-09
                                                :ARG0 p
                                                :ARG1 (p2 / politics
                                                      :mod (c2 / country :wiki "United_Kingdom" :name (n / name :op1 "UK")))))))
                        :mod (j / just))))
      :snt2 (c3 / cause-01
            :ARG1 (t6 / thread
                  :mod (t4 / this))))       
come of age, be of age

of-age-01 -- "be at level of full maturity, or maturity sufficient for an activity

  • ARG1: thing aging
  • ARG2: level of maturity reached, age
  • ARG3: action enabled by age level

Aliases: of-age(m) Synonyms: mature-02(loose match)

Coming of age , Bunuel expressed a strong urge to go to Paris to study music but was sent instead to Madrid to study agricultural engineering .

(e / express-01
      :ARG0 (p3 / person :wiki "Luis_Buñuel" :name (n / name :op1 "Bunuel"))
      :ARG1 (u / urge-01
            :ARG1 p3
            :ARG2 (g / go-02
                  :ARG0 p3
                  :ARG4 (c4 / city :wiki "Paris" :name (n2 / name :op1 "Paris"))
                  :purpose (s2 / study-01
                        :ARG0 p3
                        :ARG1 (m2 / music)))
            :ARG1-of (s / strong-02))
      :time (c / come-04
            :ARG1 p3
            :ARG2 (o / of-age-01
                  :ARG1 p3)))

Light verbs

I wanted to note that many of these might be better treated as "light verbs". For now, we've agreed to start adding light verb aliases to the propbank rolesets (starting with these MWPs):

Take Advantage

advantage-02 -- "use, overuse

  • ARG0: user
  • ARG1: thing used
  • ARG2: purpose

_Aliases: *_take-advantage(l)*** Synonyms: exploit-01(loose match)

China has demonstrated that it is willing to take advantage of this leverage.

(d / demonstrate-01
      :ARG0 (c / country :wiki "China" :name (n / name :op1 "China"))
      :ARG1 (w / will-02
            :ARG0 c
            :ARG1 (t3 / advantage-02
                  :ARG0 c
                  :ARG1 (l / leverage
                        :mod (t2 / this)))))

(While there's an argument to be made for roleset names to include light verbs, e.g. "take_advantage-02" and "make_sense-02", it may be hard to be consistent about which ones to apply that to; we can discuss that on the call if desired)

nschneid commented 9 years ago

Cool!

While looking at shed.xml, I noticed a few copy-paste bugs:

<roleset id="shed-blood.05" name="inform, provide information regarding">

Under shed-light-on:

<example name="come of beard age">

Under shed-blood:

<mwe alias="of_age">

and also, an extra parenthesis in

(A / cause-01
  :arg1 (B / injure-01)))

I assume these sorts of problems will be ironed out with time.

nschneid commented 9 years ago

I really like the inclusion of metaphoric source/target maps!

Is there documentation somewhere on the sense-match and role-match attributes? What are the criteria for their values?

timjogorman commented 9 years ago

Hey all! We are nearly done with the first round of MWP rolesets, and are just working on making things consistent. One outstanding issue, however, is how we treat aspectual alternations. We had talked of treating these "compositionally", but I wanted to present the complexities inherent in that approach now while the MWP rolesets are few and easy to change.

For an example of the kinds of aspectual alternations we see: we likely want to eventually have a "being out of line" roleset, which would be used for sentences like:

I hope you don't think that I'm out of line here.

But the same "out of line" has a dynamic version "get out of line" and a less compositional alternative "step out of line":

These people need a dictator who's willing to use mass murder when they step out of line. ( DF-212-191678-308_7777.54) These people need a dictator who's willing to use mass murder when they get out of line.

For many, but not all, we can treat them in this compositional manner, as we suggested in a prior call:

(g / get-03
      :ARG1 (t2 / they)
      :ARG2 (o2 / out-of-line-04
            :ARG1 t2))

But "step out of line" doesn't have a clear aspectual verb. One option is that we could treat most of them as compositional (as above) and make rolesets for few non-compositional ones like "step out of line" (and if so, does "get out of line" stay compositional?)

A related issue is that if you look at something like "take effect" vs "be in effect" vs "come into effect", there's a difference between "be in effect" and "take effect" that could be thrown away as a "light verb" construal, or treated as separate frames:

The referral is authorized under emergency laws in effect since 1981 The legislation must be signed into law by President Luiz Inacio Lula da Silva to take effect.

For these, too, "come into effect" could be treated compositionally (come-04 (become) :arg2 in-effect-04) but "take effect" seems strange as "take-03 :arg2 in-effect-04".

Loosely speaking, we have the option then of doing this "compositional" approach, of giving up and making separate senses for stative and dynamic versions of the same concept, or of doing the AMR-ish thing of just conflating the two into a single sense:

alternation conflation option separate sense option compositional option
in effect, come into effect, take effect in-effect-04 in-effect-04, come-into-effect-05 (with take-effect alias) take-03 (cause to be) in-effect-04: come-04 :arg2 in-effect-04
be over (a person, trauma, etc.); get over X over-02 over-02 and get-over-03 get-03 :arg2 over-02
be in the way / get in the way in-the-way-04 in-the-way-04, get-in-the-way-07 get-03 :arg2 in-the-way-04
be out of line // step out of line out-of-line-04 out-of-line-04, step-out-of-line-06 get-03 :arg2 out-of-line-04
be in mind / on one's mind / spring to mind in-mind-07 in-mind-07, come-to-mind-08 come-04 :arg2 in-mind-07
be even (w/r/t vengeance), get even even-05? even-05 (or don't frame it), get-even-04 get-03 :arg2 even-05?

Do people have strong preferences? All the approaches have pros and cons, and so I mainly want us to come to an agreement that people would be willing to continue with in the future.

nschneid commented 9 years ago

I like the idea of distinguishing stative vs. change-of-state readings without having to create separate frames.

"Come into effect" and "take effect" mean the same thing, right? Instead of using frames for the light verbs as in the compositional proposal, how about become-01?

the law is in effect

(i / in-effect-04 
    :ARG1 (l / law))

the law took effect/came into effect

(b / become-01
    :ARG2 (i / in-effect-04 
        :ARG1 (l / law)))

(Or would become-01 also have an ARG1 in AMR?)

we put/brought the law into effect

(m / make-02
    :ARG0 (w / we)
    :ARG1 (i / in-effect-04 
        :ARG1 (l / law)))

(Apologies if we already discussed and rejected this idea and I forgot!)