Closed aniket03 closed 3 years ago
I realized that training data for WMT17 was the test datasets used in WMT15 and WMT16. And the number of data points in test sets of WMT15 and WMT16 add to 5360. Hence, the issue is resolved, thus closing the same,
Hi, I was trying to download the WMT17 dataset using the
wmt/db_builder
example shared inExperiments with the WMT Metrics shared task
section. However, I found that downloading the WMT17 dataset in this manner results only in3920
candidate-reference pairs, while the number of candidate-reference pairs in WMT17 is mentioned to be5344
in the research paper.Thus, I wished to check what might be causing this discrepancy in the count of candidate-reference pairs.
PS: I also tried setting
average_duplicates
flag to False while callingwmt/db_builder
but that resulted in4132
samples. Still lower than5344
samples