Open danyaljj opened 8 years ago
@danyaljj So for the PredicateArgumentView
, do we want to remove the predicates, the arguments, or the Relation
between them?
Also, for the CoreferenceView
, do we want to keep the canonical mentions and remove the coreferent mentions?
For PredicateArgumentView
:
Gold mentions:
clean Relations
Relations
s + Predicate
s + Argument
sFor CoreferenceView
:
Relations
Constituent
s + Relation
s @joshuacamp So here is how we expect the output json to look like for each of the subtasks:
1) Raw text:
{
"corpusId": "ACE2005",
"id": "/Users/bhargav/code/cs546_project/entity-relations-coreference/data/ace05/data/English/bn/CNN_ENG_20030312_223733.14.apf.xml",
"text": " CNN_ENG_20030312_223733.14 NEWS STORY 2003-03-12 22:57:55 the morning papers, because morning pape..... "
}
2) Sentence boundaries:
{
"corpusId": "ACE2005",
"id": "/Users/bhargav/code/cs546_project/entity-relations-coreference/data/ace05/data/English/bn/CNN_ENG_20030312_223733.14.apf.xml",
"text": " CNN_ENG_20030312_223733.14 NEWS STORY 2003-03-12 22:57:55 the morning papers, because morning papers around the country and around the world are remaking their front page to include the Elizabeth Smart case, and we end with that tonight. With her family on a day when their miracle finally came true. she looks very, very healthy. She\u0027s grown a lot. And I\u0027m just so absolutely thrilled. ",
"tokens": [
"CNN_ENG_20030312_223733.14",
"NEWS",
"STORY",
"2003-03-12",
"22:57:55",
"the",
"morning",
"papers",
",",
"because",
"morning",
"papers",
"around",
"the",
"country",
"and",
"around",
"the",
"world",
"are",
"remaking",
"their",
"front",
"page",
"to",
"include",
"the",
"Elizabeth",
"Smart",
"case",
",",
"and",
"we",
"end",
"with",
"that",
"tonight",
".",
"With",
"her",
"family",
"on",
"a",
"day",
"when",
"their",
"miracle"
],
"sentences": {
"generator": "UserSpecified",
"score": 1.0,
"sentenceEndPositions": [
38,
57,
63,
71,
77
]
}
}
3) Gold mention:
{
"corpusId": "ACE2005",
"id": "/Users/bhargav/code/cs546_project/entity-relations-coreference/data/ace05/data/English/bn/CNN_ENG_20030312_223733.14.apf.xml",
"text": " CNN_ENG_20030312_223733.14 NEWS STORY 2003-03-12 22:57:55 the morning papers, because morning papers around the country and around the world are remaking their front page to include the Elizabeth Smart case, and we end with that tonight. With her family on a day when their miracle finally came true. she looks very, very healthy. She\u0027s grown a lot. And I\u0027m just so absolutely thrilled. ",
"tokens": [
"CNN_ENG_20030312_223733.14",
"NEWS",
"STORY",
"2003-03-12",
"22:57:55",
"the",
"morning",
"papers",
",",
"because",
"morning",
"papers",
"around",
"the",
"country",
"and",
"around",
"the",
"world",
"are",
"remaking",
"their",
"front",
"page",
"to",
"include",
"the",
"Elizabeth",
"Smart",
"case",
",",
"and",
"we",
"end",
"with",
"that",
"tonight",
".",
"With",
"her",
"family",
"on",
"a",
"day",
"when",
"their",
"miracle"
],
"sentences": {
"generator": "UserSpecified",
"score": 1.0,
"sentenceEndPositions": [
38,
57,
63,
71,
77
]
},
"views": [
{
{
"viewName": "ENTITYVIEW",
"viewData": [
{
"viewType": "edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView",
"viewName": "ENTITYVIEW",
"generator": "edu.illinois.cs.cogcomp.nlp.corpusreaders.ACEReader",
"score": 1.0,
"constituents": [
{
"score": 1.0,
"start": 5,
"end": 8
},
{
"score": 1.0,
"start": 13,
"end": 15,
"properties": {
"EntityHeadEndCharOffset": "128",
"EntityHeadStartCharOffset": "122"
}
},
{
"score": 1.0,
"start": 17,
"end": 19,
]
}
}
}
Here is the dataset: https://github.com/cogcomp-dev/illinois-cogcomp-nlp/blob/master/corpusreaders/doc/ACEReader.md
We are using the
SpanLabelView
; so I think the existing evaluators/cleansers should work.PredicateArgumentView
; we have an evaluator for it. Although we need to write a cleanser for it.We are using the
CoreferenceView
; I am in process of checking in the evaluators for it: https://github.com/cogcomp-dev/illinois-cogcomp-nlp/pull/157 We need to write a cleanser for it.