amrisi / amr-guidelines

239 stars 86 forks source link

The Zen of AMR #142

Open nschneid opened 9 years ago

nschneid commented 9 years ago

At some point in the development of the Python language, someone decided it would be a good idea to write down aphorisms that crystallize many of its important design principles. The Zen of Python is invoked all the time when discussing proposed improvements to the language and determining how "Pythonic" they are.

I think it is worth developing a similar set of general principles for AMR. :) Here is an initial attempt based loosely on the Python ones, but I'm sure y'all can improve this list:

  1. One graph to rule them all.
    • Defragmented semantics in (basically) a rooted DAG.
    • AMR is not in the derivation business.
  2. Entity and event variables give us coreference for free. (neo-Davidsonian)
  3. Deep is better than shallow. Paraphrases should have the same canonical form.
    • ...but practicality (for annotation) beats purity.
  4. It doesn’t have to be an interlingua to be useful, so long as it is more logical than English.
  5. There should be one and preferably only one obvious way to annotate it. (Ideally, annotators should agree.)
  6. The annotator experience matters.
  7. Inflectional morphology and morphosyntax should be disregarded.
  8. Verbs (events) are better than nouns (plain concepts).
    • ...except when the meaning is noncompositional.
    • Adjectives are better than adverbs.
    • Adjective frames are better than :mod.
    • Core roles are better than non-core roles.
    • Anything is better than :prep-X!
  9. Occasionally you need to hallucinate to fill in the gaps.
  10. A government is a government-organization that governs.
  11. The sentence is a starting point, but we should not let it be a straightjacket. (we want to get discourse too!)
  12. AMR does not compromise for algorithmic expedience—the algorithms will have to catch up!
kevincrawfordknight commented 9 years ago

love it!

nschneid commented 9 years ago

For 7: "Morphosyntactic sugar considered unhealthy"

nschneid commented 9 years ago

If the frame provides an argument, use it. (Core roles > non-core roles) See 8.

kevincrawfordknight commented 9 years ago

Discussing AMR here w/ folks at the graph theory / graph transformation workshop at Dagstuhl ... Here are some more bullets for you to consider, Nathan. These are obvious to us, but they're not obvious in general :)

1) "AMR is a meaning-bank, not a derivation-bank." We purposefully treat the relation between string and meaning as an open research project that people are free to solve in many ways. Other banks (esp. those produced within grammatical traditions) actually come with much of the derivation specified, and a lot of the discussion centers on those derivations. It's easy to get confused about whether we are talking about the target representation (as we do in AMR), or about the derivation (how to get from A to B).

2) "AMR is one task." We're de-fragmenting. We know this makes life hard for people who only do, say, named-entity recognition. Likewise for folks who actually do want to model a huge chunk of meaning, but do not want to deal with pronouns.

In both these senses, we're influenced by machine translation, where we get the input and output, but the derivation is completely up to us. Also, MT developers cannot decide to translate only those sentences which don't have pronouns.

3) "We don't design AMR to make parsing easier."

4) "One root." AMRs have one root, which makes annotators happy. To enable this, every relation has an inverse and a reification. Focus goes at the root. Inverses hang down. Reifications allow us to modify relations. ("John was in Los Angeles yesterday").

5) "AMR isn't designed around grammar." Entities in the real world play multiple roles, and those multiple roles can be realized by English syntactic gizmos like pronouns, zero pronouns, reflexives, control structures ... or even left implicit. We may even have our choice of gizmo, e.g.: John wants to be seen by someone. John wants someone to see him. These two sentences mean the same thing ... the fact that pronouns ("him") have a certain non-local linguistic behavior compared to control structures ("wants to") isn't something we consider when designing AMR. These two sentences just mean the same thing & hence get the same AMR. Everything else is actually a discussion of how to derive the meaning from the string, a topic on which AMR has no opinion (see point 1 above).

kevincrawfordknight commented 9 years ago

nathan,

another slogan can be: "No cycles in AMR (basically)". it's good to clear this up, as theoreticians and parser-designers are especially interested in cycles.

most of the cycles in the first AMR release (which most people outside our core group still work with!) were actually due to annotation errors -- e.g., at a re-entrancy point, typing "w" instead of "w2". ulf implemented a cycle detector to locate these, and these are basically fixed now. the cycle detector is now also part of the checker.

most remaining cycles are due to relative clauses, as in "i saw the man who burned his foot" = "(s / see-01 :arg0 (i / i) :arg1 (m / man :arg0-of (b / burn-01 :arg1 (f / foot :part m))))". if you draw this as a graph and reverse the :arg0-of arrow, and relabel it as :arg0, then you wind up with an acyclic graph.

(this graph will usually be multi-rooted (here b and s), and hence not convertible back into PENMAN-style AMR format. david chiang has a displayer for multi-rooted graphs... i find it very hard to "read off" an english realization from such a display, compared to reading it off from the display of the single-rooted graph! but theoreticians are often completely happy with multi-rooted graphs, or even disconnected ones.)

at this point, we are down to just 0.3% of AMRs still having "legitimate" cycles. these cases are arguable -- another annotator might do the same thing without cycles. they often involve :manner and similar roles. ulf recently wrote a program to reify such roles in cyclic AMRs, getting rid of cycles.

kevin

ps. i notice that linguists and theoreticians are keen to study any remaining one or two "weird cycle cases", while engineers know that deleting these examples from the data will not affect your smatch score. i tend to sympathize with the latter folks in this case, given that AMRs are in some sense "made up" anyway ... it's not like we're not going to discover an actual whole new species of spider monkey hiding in the actual amazon jungle :) if asked to write a short paper about cycles in AMR, i would write "No cycles in AMR" about five hundred times, followed by "except for annotation errors, plus relative clauses and manner-related things, which can be transformed automatically into acyclic graphs".

nschneid commented 9 years ago

Thanks @kevincrawfordknight ! I added a couple of bullet points under the first item.

"We don't design AMR to make parsing easier."

Can you elaborate on this?

kevincrawfordknight commented 9 years ago

Annotations are often built or extended with current technology in mind. Even we sometimes say "whoa, if we make that design decision, it will be hard to build a parser ..." But we always dismiss this factor, optimistic as we are about the ingeniousness of possibly-as-yet-unborn computational linguistic algorithm designers.

nschneid commented 9 years ago

I'm trying to decide whether/how to boil this down into an aphorism.

Maybe: "AMR does not compromise for algorithmic expedience—the algorithms will have to catch up!"