Open Victor0118 opened 5 years ago
I guess the split is based these two files: large2470-test.tsv
and large2470-train.tsv
(Large Variant of the Dataset) excluding QA pairs with 'lfb' ids (QA pairs from live.ailao.eu I guess. see https://github.com/brmson/dataset-factoid-curated/commit/d81aca55d9afdc9b541ce403b4d346e63375db6b).
The numbers from DrQA paper are correct in this case, but I'm not sure where the number 1204 comes from in the R^3 paper.
In some open domain QA papers, I saw the CuratedTREC dataset is used and linked here. But I cannot find the train/test split here. Even more surprisingly, I find the statistics of the train/test splits in two papers are different:
Does anyone know how to solve this problem?