CogComp / open-eval

An open source evaluation framework for developing NLP systems
8 stars 5 forks source link

ACE dataset tasks, evaluators and redactors #149

Open danyaljj opened 8 years ago

danyaljj commented 8 years ago

Here is the dataset: https://github.com/cogcomp-dev/illinois-cogcomp-nlp/blob/master/corpusreaders/doc/ACEReader.md

We are using the SpanLabelView; so I think the existing evaluators/cleansers should work.

We are using the CoreferenceView; I am in process of checking in the evaluators for it: https://github.com/cogcomp-dev/illinois-cogcomp-nlp/pull/157 We need to write a cleanser for it.

joshuacamp commented 8 years ago

@danyaljj So for the PredicateArgumentView, do we want to remove the predicates, the arguments, or the Relation between them?

joshuacamp commented 8 years ago

Also, for the CoreferenceView, do we want to keep the canonical mentions and remove the coreferent mentions?

danyaljj commented 8 years ago

For PredicateArgumentView:

For CoreferenceView:

danyaljj commented 8 years ago

@joshuacamp So here is how we expect the output json to look like for each of the subtasks:

1) Raw text:

{
  "corpusId": "ACE2005",
  "id": "/Users/bhargav/code/cs546_project/entity-relations-coreference/data/ace05/data/English/bn/CNN_ENG_20030312_223733.14.apf.xml",
  "text": "  CNN_ENG_20030312_223733.14   NEWS STORY   2003-03-12 22:57:55     the morning papers, because morning pape..... "
 }

2) Sentence boundaries:

{
  "corpusId": "ACE2005",
  "id": "/Users/bhargav/code/cs546_project/entity-relations-coreference/data/ace05/data/English/bn/CNN_ENG_20030312_223733.14.apf.xml",
  "text": "  CNN_ENG_20030312_223733.14   NEWS STORY   2003-03-12 22:57:55     the morning papers, because morning papers around the country and around the world are remaking their front page to include the Elizabeth Smart case, and we end with that tonight. With her family on a day when their miracle finally came true.   she looks very, very healthy. She\u0027s grown a lot. And I\u0027m just so absolutely thrilled. ",
  "tokens": [
    "CNN_ENG_20030312_223733.14",
    "NEWS",
    "STORY",
    "2003-03-12",
    "22:57:55",
    "the",
    "morning",
    "papers",
    ",",
    "because",
    "morning",
    "papers",
    "around",
    "the",
    "country",
    "and",
    "around",
    "the",
    "world",
    "are",
    "remaking",
    "their",
    "front",
    "page",
    "to",
    "include",
    "the",
    "Elizabeth",
    "Smart",
    "case",
    ",",
    "and",
    "we",
    "end",
    "with",
    "that",
    "tonight",
    ".",
    "With",
    "her",
    "family",
    "on",
    "a",
    "day",
    "when",
    "their",
    "miracle"
  ],
  "sentences": {
    "generator": "UserSpecified",
    "score": 1.0,
    "sentenceEndPositions": [
      38,
      57,
      63,
      71,
      77
    ]
  }
}

3) Gold mention:

{
  "corpusId": "ACE2005",
  "id": "/Users/bhargav/code/cs546_project/entity-relations-coreference/data/ace05/data/English/bn/CNN_ENG_20030312_223733.14.apf.xml",
  "text": "  CNN_ENG_20030312_223733.14   NEWS STORY   2003-03-12 22:57:55     the morning papers, because morning papers around the country and around the world are remaking their front page to include the Elizabeth Smart case, and we end with that tonight. With her family on a day when their miracle finally came true.   she looks very, very healthy. She\u0027s grown a lot. And I\u0027m just so absolutely thrilled. ",
  "tokens": [
    "CNN_ENG_20030312_223733.14",
    "NEWS",
    "STORY",
    "2003-03-12",
    "22:57:55",
    "the",
    "morning",
    "papers",
    ",",
    "because",
    "morning",
    "papers",
    "around",
    "the",
    "country",
    "and",
    "around",
    "the",
    "world",
    "are",
    "remaking",
    "their",
    "front",
    "page",
    "to",
    "include",
    "the",
    "Elizabeth",
    "Smart",
    "case",
    ",",
    "and",
    "we",
    "end",
    "with",
    "that",
    "tonight",
    ".",
    "With",
    "her",
    "family",
    "on",
    "a",
    "day",
    "when",
    "their",
    "miracle"
  ],
  "sentences": {
    "generator": "UserSpecified",
    "score": 1.0,
    "sentenceEndPositions": [
      38,
      57,
      63,
      71,
      77
    ]
  }, 
    "views": [
    {
        {
      "viewName": "ENTITYVIEW",
      "viewData": [
        {
          "viewType": "edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView",
          "viewName": "ENTITYVIEW",
          "generator": "edu.illinois.cs.cogcomp.nlp.corpusreaders.ACEReader",
          "score": 1.0,
          "constituents": [
            {
              "score": 1.0,
              "start": 5,
              "end": 8
            },
            {
              "score": 1.0,
              "start": 13,
              "end": 15,
              "properties": {
                "EntityHeadEndCharOffset": "128",
                "EntityHeadStartCharOffset": "122"
              }
            },
            {
              "score": 1.0,
              "start": 17,
              "end": 19,
           ]
        }
    }
}