amrisi / amr-guidelines

246 stars 86 forks source link

Multi-sentence annotation #141

Open timjogorman opened 9 years ago

timjogorman commented 9 years ago

First, I wanted to refresh the multi-sentence discussion, and drag it into the Github. I'm assuming and referencing Dan Marcou's proposal here -- would it be ok to post that here too?

Secondly, To continue the discussion, I wanted to flesh out this idea of doing that proposed kind of annotation in our current tool for document-level annotation, Anafora. I'm sure that if we were to sit down an design an editor for coreference we might be able to set something more fluid, but I wanted to show that most of the functionality we'd want is already in this tool, so it could be a good provisional way of testing things out. (My assumption is that we could annotate over AMRs like this, but that those annotations would be converted to the kind of in-line format proposed in Daniel's multi-sentence document). The first video shows a very simplistic version of just doing coreference over AMRs (not RED, just coreference): RED/AMR part1 (3:28) The next videos expand this with additional directions that we could consider going into. I'm framing this using the formalism for RED, which I think is a nice version of "ambitious" annotation, but there are clearly other discourse annotation directions that should be considered. The RED ideas is that alongside our annotation of coreference, we should be marking "document level" features like modality, tense, and event or entity status, and that we could even have a second stage marking causal and temporal relationships between events. This video shows how that would work over AMRs: RED/AMR part2 (3:48) Finally, I wanted to go further into the idea that with a stand-off tool, we could actually pre-annotate a bunch of these proposed features and just have annotators correct them. I've posted an additional videos for that (apologies for the low audio volume in these two): RED/AMR part3 (3:27) Finally, just to show how this would work over a hard domain like the biomedical data, I posted an annotation of about half the data in that original multi-sentence document. This is a bit bumbling (this looks like hard data to handle) but I'd hope it shows that even pretty hard domains wouldn't be all that hard to do handle with something like Anafora/RED: RED/AMR part4 (4:36)

This is partly to just keep the conversation going on these, and to present the RED idea. I'll hopefully have a "converter" soon to spit these annotations out into an inline AMR format too; when I do, I'll post some annotations here.

timjogorman commented 9 years ago

I wanted to put up some discussion points in case we get to multi-sentence issues this week.