Closed dirkweissenborn closed 7 years ago
With the current code I get:
$ grep -cE '^\./cnn/stories/[0-f]+\.story,' *
dev.csv:5988
test.csv:5971
train.csv:107674
To split, we originally shuffled story ID's. We'll discuss internally and get back to you.
Any news on this issue?
Thanks for pointing this out @dirkweissenborn. We discussed internally and we think the correction will be to update the story ID lists in here to match the paper. We'll give another update soon.
In that section of the paper, we filtered out unanswerable questions, so those numbers don't count questions with no answers. We'll update the code soon to help filters those out.
I am not able to reproduce the numbers mentioned in the paper with the current split: "92,487 samples training, 5,103 for validation, and 5,251 for testing"
Could you please explain in detail how these samples are extracted from the split dataset?
Do they correspond to the number of questions or the total number of question-answer pairs? It seems that sometimes more than one answer span is correct.
How was the evaluation performed in the paper given that there can be multiple correct spans for one question.