after some great feedback from Kira, I'm suggesting a small set for initial multi-sentence annotation:
PROXY_AFP_ENG_20020111_0093 (41 AMRs, for comparison with other coreference)
wb/eng/0003 (100 AMRs, for comparison with multi-lingual AMR annotation)
dfb-0016 (let's just do the first 63 AMRs; for looking at IAA and DF thread issues)
dfb-0030 (44 AMRs; just another with DF threads)
(this last one could be replaced by other suggestions, but we'd have to run this by LDC:
Little Prince excerpt?
other newswire data (wsj.0003, "Kent Cigarettes" data)
other suggestions?)
Reasoning:
_PROXY_AFP_ENG_200201110093: has coreference and other annotation (RED), so that we can see how much we gain by doing coreference over the AMRs. It also shows some very hard generic phenomena that we should examine early.
wb/eng/0003: ontonotes document with parallel translations into Chinese and Czech, and is the only document with Chinese AMRs and Czech AMRs, so would let us ask interesting questions. Would be good to do a long-ish document (100 AMRs here) to get a sense of whether a given approach breaks down at that length.
dfb-0016: Good contiguous document with some of the issues that DF conversational threads pose, and we have the multiple annotations on it ( document-level SMATCH?)
dfb-0030: Just another document of the right size and with continuous posts, which has specific phenomena about how we refer to groups.
after some great feedback from Kira, I'm suggesting a small set for initial multi-sentence annotation:
Reasoning: _PROXY_AFP_ENG_200201110093: has coreference and other annotation (RED), so that we can see how much we gain by doing coreference over the AMRs. It also shows some very hard generic phenomena that we should examine early.
wb/eng/0003: ontonotes document with parallel translations into Chinese and Czech, and is the only document with Chinese AMRs and Czech AMRs, so would let us ask interesting questions. Would be good to do a long-ish document (100 AMRs here) to get a sense of whether a given approach breaks down at that length.
dfb-0016: Good contiguous document with some of the issues that DF conversational threads pose, and we have the multiple annotations on it ( document-level SMATCH?)
dfb-0030: Just another document of the right size and with continuous posts, which has specific phenomena about how we refer to groups.