amrisi / amr-guidelines

246 stars 86 forks source link

multi-sentence pilot set suggestions #166

Closed timjogorman closed 8 years ago

timjogorman commented 8 years ago

after some great feedback from Kira, I'm suggesting a small set for initial multi-sentence annotation:

Reasoning: _PROXY_AFP_ENG_200201110093: has coreference and other annotation (RED), so that we can see how much we gain by doing coreference over the AMRs. It also shows some very hard generic phenomena that we should examine early.

wb/eng/0003: ontonotes document with parallel translations into Chinese and Czech, and is the only document with Chinese AMRs and Czech AMRs, so would let us ask interesting questions. Would be good to do a long-ish document (100 AMRs here) to get a sense of whether a given approach breaks down at that length.

dfb-0016: Good contiguous document with some of the issues that DF conversational threads pose, and we have the multiple annotations on it ( document-level SMATCH?)

dfb-0030: Just another document of the right size and with continuous posts, which has specific phenomena about how we refer to groups.