Alab-NII / mrc_annotation

Annotation and parsers for "Evaluation Metrics for Machine Reading Comprehension"
1 stars 1 forks source link

How do I get back the original question for newsqa? #1

Open sagnik opened 3 years ago

sagnik commented 3 years ago

This is a sample from newsqa.json:

  {
    "original_id": "./cnn/stories/eace03b0b83764932f0dbb3898e3312c1d2f8bed.story",
    "annotations": [
      {
        "skills": [
          11
        ],
        "sents_indices": [
          [
            107,
            126
          ]
        ],
        "skill_count": 1,
        "nonsense": false
      }
    ],
    "id": "newsqa_000"
  },

There are 8 questions in the newsqa dataset (when loaded through huggingface datasets) with that same story id. How do I know which one this annotation refers to?

sagnik commented 3 years ago

@annargrs

sakusugawara commented 3 years ago

Thank you for your question @sagnik. Yes, in NewsQA a passage has multiple questions, and the annotation doesn't specify it. I pushed annotation/newqa_id_to_question.json for associating each annotation to a question, which you can replicate by running parser/newsqa_parse.py on newsqa_dev.csv (that I generated by the original data release).