I'm following up on the discussion from last week on multi-sentence AMR (#168) with the remainder of a proposed annotation scheme. Hopefully I can also directly demo the annotation tool during the call, but I added some videos here just in case that doesn't work.
The actual annotation
Here's an example of the annotation for multi-sentence AMR, which we started talking about last week, using our Anafora annotation tool that we've used for RED annotation. You can see that the AMRs here are normal AMRs, and that the identity chains are made by simply clicking on every member of a particular chain:
You'll notice that in addition to the normal "coreference" we've been talking about, I'm using different identity chains with different features like "actual", "unique", "hypothetical". That's the main proposal talked about here: that we can think of this, not just as vanilla coreference task, but as representing the events and entities involved in a document.
Why we want features like modality
Modality
Normal AMRs don't necessarily encode whether or not an event actual happened. When there is explicit negation we can get it, but the "meet-03" or "eat-01" in the examples below would have no explicit negation:
(1) He played golf instead of meeting with us
(2) If I had eaten breakfast I wouldn't be hungry now
Beyond that, many many events are hypothetical -- things that are offered, threatened, discussed, etc. -- and we don't have marking on them. These seem like a pretty important thing for us to be missing
(That "priorities" spreadsheet had modality and better treatments of negation marked as needed for QA, MT, Summarization, RTE and generation)
We've avoided doing those things at the AMR level because it was slippery to decide what grammatical elements qualified for "irrealis" status. But at the document level, we aren't tied to the text: we can just have annotators mark the objective status of an event/entity. A slightly modified version of what we use for RED would be something like: actual, uncertain, not-actual, and hypothetical (and maybe something like generic or abstract; see below)
Counting events and entities
The other issue that is often seen as missing from AMRs that we might want to capture at this level is "number" phenomena, such as the differences between (3), (4) and (5) (all presumably having the same AMR):
(3) A man stole the hat
(4) Men steal hats
(5) The men stole a hat
(s / steal-01
:ARG0 (m / man)
:ARG1 (h / hat))
We don't want to do "number" in an English-specific of categories like "bare plural" or "mass noun". But that doesn't mean that we aren't losing a lot by discarding this. At minimum, we might want to be able to identify when a particular identity chain refers to a single, unique entity:
A man stole the hat
(s / steal-01
:ARG0 (m / man ) << unique
:ARG1 (h / hat)) << unique
We could also identify reference to kinds, i.e. "roughly all men" or "roughly all hats" (loosely, ∀(x))
Men steal hats
(s / steal-01
:ARG0 (m / man) << kind
:ARG1 (h / hat)) << kind
"plural" or "mass" things could all fit into an "aggregate" class in the middle (loosely, Some(x)..)
The men stole a hat
(s / steal-01
:ARG0 (m / man) << aggregate
:ARG1 (h / hat)) << unique
The scope problem/opportunity
Ideally, "Every farmer who owns a donkey beats it" would then get roughly the same representation as "farmers who own donkeys beat them"; "d" is a set (donkeys owned by famers) in either case, so we'd get:
Similarly, for "the men each stole a hat", we just have a countable amount of men and a countable amount of hats, getting us:
The men each stole a hat
(s / steal-01
:ARG0 (m / man) << aggregate
:ARG1 (h / hat)) << aggregate
I'm assuming with all of this that you can distinguish events in the same manner, but add a fourth category for generalizations (i.e. "characterizing"(Krifka) or "kind-referring" generics(Friedrich and Pinkal)) like "men steal hats". We could even use that for these cases like:
(s / smart-06 << characterizing event
:ARG1 (p / person << kind of entity
:ARG0-of (s2 / study-01)
:mod (t / tall)))
Everything Else:
For completeness: I'm also assuming that you want to be able to add a few "partial coreference" or "bridging" relations like "set/member" while annotating. I can post on this further if people find it controversial.
I'm following up on the discussion from last week on multi-sentence AMR (#168) with the remainder of a proposed annotation scheme. Hopefully I can also directly demo the annotation tool during the call, but I added some videos here just in case that doesn't work.
The actual annotation
Here's an example of the annotation for multi-sentence AMR, which we started talking about last week, using our Anafora annotation tool that we've used for RED annotation. You can see that the AMRs here are normal AMRs, and that the identity chains are made by simply clicking on every member of a particular chain:
A (silent) video of the basic annotation on the "PROXY" document in our pilot set
You'll notice that in addition to the normal "coreference" we've been talking about, I'm using different identity chains with different features like "actual", "unique", "hypothetical". That's the main proposal talked about here: that we can think of this, not just as vanilla coreference task, but as representing the events and entities involved in a document.
Why we want features like modality
Modality
Normal AMRs don't necessarily encode whether or not an event actual happened. When there is explicit negation we can get it, but the "meet-03" or "eat-01" in the examples below would have no explicit negation: (1)
He played golf instead of meeting with us
(2)
If I had eaten breakfast I wouldn't be hungry now
Beyond that, many many events are hypothetical -- things that are offered, threatened, discussed, etc. -- and we don't have marking on them. These seem like a pretty important thing for us to be missing (That "priorities" spreadsheet had modality and better treatments of negation marked as needed for QA, MT, Summarization, RTE and generation)
We've avoided doing those things at the AMR level because it was slippery to decide what grammatical elements qualified for "irrealis" status. But at the document level, we aren't tied to the text: we can just have annotators mark the objective status of an event/entity. A slightly modified version of what we use for RED would be something like: actual, uncertain, not-actual, and hypothetical (and maybe something like generic or abstract; see below)
Counting events and entities
The other issue that is often seen as missing from AMRs that we might want to capture at this level is "number" phenomena, such as the differences between (3), (4) and (5) (all presumably having the same AMR): (3)
A man stole the hat
(4)
Men steal hats
(5)
The men stole a hat
We don't want to do "number" in an English-specific of categories like "bare plural" or "mass noun". But that doesn't mean that we aren't losing a lot by discarding this. At minimum, we might want to be able to identify when a particular identity chain refers to a single, unique entity:
We could also identify reference to kinds, i.e. "roughly all men" or "roughly all hats" (loosely, ∀(x))
"plural" or "mass" things could all fit into an "aggregate" class in the middle (loosely, Some(x)..)
The scope problem/opportunity
Ideally, "Every farmer who owns a donkey beats it" would then get roughly the same representation as "farmers who own donkeys beat them"; "d" is a set (donkeys owned by famers) in either case, so we'd get:
Similarly, for "the men each stole a hat", we just have a countable amount of men and a countable amount of hats, getting us:
I'm assuming with all of this that you can distinguish events in the same manner, but add a fourth category for generalizations (i.e. "characterizing"(Krifka) or "kind-referring" generics(Friedrich and Pinkal)) like "men steal hats". We could even use that for these cases like:
Everything Else:
For completeness: I'm also assuming that you want to be able to add a few "partial coreference" or "bridging" relations like "set/member" while annotating. I can post on this further if people find it controversial.
More annotation example video
Hopefully all those claims made sense! Here are some more brief screencasts of annotating: A few more sentences in that document A few more after that
If anyone wants to try it out, I can set them up with documents and an Anafora account.