correct train, dev, test split

Maluuba / newsqa

Tools for using Maluuba's NewsQA Dataset (public version)

https://www.microsoft.com/en-us/research/project/newsqa-dataset/

Other

253 stars 58 forks source link

correct train, dev, test split #5

Closed dirkweissenborn closed 7 years ago

dirkweissenborn commented 7 years ago

I am not able to reproduce the numbers mentioned in the paper with the current split: "92,487 samples training, 5,103 for validation, and 5,251 for testing"

Could you please explain in detail how these samples are extracted from the split dataset?

Do they correspond to the number of questions or the total number of question-answer pairs? It seems that sometimes more than one answer span is correct.

How was the evaluation performed in the paper given that there can be multiple correct spans for one question.

juharris commented 7 years ago

With the current code I get:

$ grep -cE '^\./cnn/stories/[0-f]+\.story,' *
dev.csv:5988
test.csv:5971
train.csv:107674

To split, we originally shuffled story ID's. We'll discuss internally and get back to you.

dirkweissenborn commented 7 years ago

Any news on this issue?

juharris commented 7 years ago

Thanks for pointing this out @dirkweissenborn. We discussed internally and we think the correction will be to update the story ID lists in here to match the paper. We'll give another update soon.

juharris commented 7 years ago

In that section of the paper, we filtered out unanswerable questions, so those numbers don't count questions with no answers. We'll update the code soon to help filters those out.