Multi-sentence annotation format proposal

timjogorman commented 8 years ago

A proposed coreference format

I wanted to start a discussion about representing multi-sentence annotation within individual AMRs, and propose a basic format for it. This is just a slight tweak on the format that Daniel Marcu proposed many many months ago, which you see below, where "1.p" stands for "variable p in sentence 1":

s1: Bill visited Boulder
(v / visit-01
      :ARG0 (p / person :wiki - :name (n / name :op1 "Bill"))
      :location (c / city :wiki Boulder,_Colorado :name (n3 / name :op1 "Boulder")))

s2: He explored the city.
(e / explore-01
      :ARG0 (h / he :coref_ident 1.p)
      :ARG1 (c2 / city))

I really like that general approach, but would propose two tweaks. Mainly, I think this annotation would be more readible if combined with something about the antecedent, like the concept (or when available, its name), as in "1p_Bill" or "1v_visit":

s2: He explored the city.
(e / explore-01
      :ARG0 (h / he  :coref-ident "1p_Bill")
      :ARG1 (c2 / city))

s3: He liked his visit.
(l2 / like-01
      :ARG0 (h / he :coref-ident "1p_Bill")
      :ARG1 (v / visit-01 :coref-ident "1v_visit"
            :ARG0 h))

The second tweak would simply be to also annotate the original mention with that label as well. That shifts this from being an actual pointer to being a name of a discourse chain, and means we could treat it like any other AMR constant.

Coreference to wikified NEs.

If things are coreferential with a wikified entity, we don't even need to do that. For example, "the city" in the second sentence could just like directly to its URI of "Boulder,_Colorado":

s1: Bill visited Boulder
(v / visit-01
      :ARG0 (p / person :wiki - :coref-ident "1p_Bill" :name (n / name :op1 "Bill"))
      :location (c / city :wiki "Boulder,_Colorado" :name (n3 / name :op1 "Boulder")))

s2: He explored the city.
(e / explore-01
      :ARG0 (h / he  :coref-ident "1p_Bill")
      :ARG1 (c2 / city :coref-wiki "Boulder,_Colorado"))

(I'd assume we want to distinguish these from normal :wiki relations, since they require discourse processing, but am open to arguments for just using :wiki here)

Roles across sentences

Finally, let's say that we get a sentence like the following, and we want to represent that one of the roles goes to another referent (say, we want to label that "Bill" was the person reimbursed, ARG2):

s4: The visit was reimbursed.
(r / reimburse-01
      :ARG3 (v / visit-01 :coref_ident "1v_visit"))

We could represent with something like "amr-implicit":

(r / reimburse-01
      :ARG2 (a / amr-implicit :coref_ident "1p_Bill")
      :ARG3 (v / visit-01 :coref_ident "1v_visit"))

The current way this is done in Anafora involves shows annotators unfilled numbered arguments, and lets them add those arguments to coreference chains. For example, in this screencap for the sentence "They don't even know", the blue "fact that is known" can just be linked to other referents.

Actual annotation

Does this seem like a decent way of representing things? I'm including an annotation of one of the documents in the pilot set, dfb-0030, using these assumptions. Next week, I can show how it is being annotated, after a few remaining topics (whether to annotate a few features like modality, and which things should be marked). I'm assuming, however, that annotating anything beyond what's proposed here would be stored as relationships between these discourse-level referents like "1v_visit", and not added to existing AMRs.

nschneid commented 8 years ago

Thanks @timjogorman, this is a nice proposal. I unfortunately won't be able to join today's call, so here's feedback based on my initial impression:

Reading through the annotated document, I find it confusing that :coref-ident is used for both the original mention and the other mentions linking back to it. How about :coref-key for the original/local use, so when reading the AMR I can tell (without checking the sentence number and variable name) that it's not adding coreference information in itself?

s1: Bill visited Boulder
(v / visit-01
      :ARG0 (p / person :wiki - :coref-key "1p_Bill" :name (n / name :op1 "Bill"))
      :location (c / city :wiki "Boulder,_Colorado" :name (n3 / name :op1 "Boulder")))

s2: He explored the city.
(e / explore-01
      :ARG0 (h / he  :coref-ident "1p_Bill")
      :ARG1 (c2 / city :coref-wiki "Boulder,_Colorado"))

Regarding :coref-wiki, I suspect that there may be problems if the Wikipedia page does not refer to a single unique entity in the world. E.g., I could imagine talking about two strains of E. coli, and NOT wanting to mark them as coreferent with each other. So it might be better to dispense with :coref-wiki and just use :coref-ident for all coreferent mentions.
I like the proposed amr-implicit. Will that suffice for all of the non-canonical forms of coreference that we'd like to handle? E.g., what about multiple-single/plural coreference, as in "John likes chocolate. Mary does too. But they prefer different kinds."?
Would we take a position on how amr-implicit corresponds to kinds of null instantiations in FrameNet? E.g., would we only use it for DNIs (definite null instantiations), where there is a specific implicit role-filler recoverable from context? Presumably we wouldn't use it for INIs like the unspecified food eaten in "I ate early this morning."

timjogorman commented 8 years ago

Thanks Nathan!

I like the proposed amr-implicit. Will that suffice for all of the non-canonical forms of coreference that we'd like to handle? E.g., what about multiple-single/plural coreference, as in "John likes chocolate. Mary does too. But they prefer different kinds."?

Personally, I'd prefer "they" to be a separate identity chain, and to annotate a cross-sentence set/member (:subset) relation between it and "john" and "Mary". We've been marking set/member relations in our RED coreference annotation for this kind of thing.

Would we take a position on how amr-implicit corresponds to kinds of null instantiations in FrameNet? E.g., would we only use it for DNIs (definite null instantiations), where there is a specific implicit role-filler recoverable from context? Presumably we wouldn't use it for INIs like the unspecified food eaten in "I ate early this morning."

Agreed; this would only be a subset of DNI instantiations (and CNI instantiations, when recoverable). I think Framenet also has DNIs that lack explicitly recoverable links, but I assume we don't want those. (e.g.: if you had a frame for "When Obama ran in 2008...", I think the "office being sought" might be a DNI)

Regarding :coref-wiki, I suspect that there may be problems if the Wikipedia page does not refer to a single unique entity in the world. E.g., I could imagine talking about two strains of E. coli, and NOT wanting to mark them as coreferent with each other. So it might be better to dispense with :coref-wiki and just use :coref-ident for all coreferent mentions.

Good point! That's definitely a situation we'll have to consider. Since the vast majority of :wiki links are genuinely unique, I'd vote to treat those non-unique wiki links as a special case, but I definitely agree that we need a way of handling them.

amrisi / amr-guidelines