google-research / bleurt

BLEURT is a metric for Natural Language Generation based on transfer learning.
https://arxiv.org/abs/2004.04696
Apache License 2.0
697 stars 85 forks source link

WMT17 dataset: Mismatch in candidate-reference pairs counts #23

Closed aniket03 closed 3 years ago

aniket03 commented 3 years ago

Hi, I was trying to download the WMT17 dataset using the wmt/db_builder example shared in Experiments with the WMT Metrics shared task section. However, I found that downloading the WMT17 dataset in this manner results only in 3920 candidate-reference pairs, while the number of candidate-reference pairs in WMT17 is mentioned to be 5344 in the research paper.

Thus, I wished to check what might be causing this discrepancy in the count of candidate-reference pairs.

PS: I also tried setting average_duplicates flag to False while calling wmt/db_builder but that resulted in 4132 samples. Still lower than 5344 samples

aniket03 commented 3 years ago

I realized that training data for WMT17 was the test datasets used in WMT15 and WMT16. And the number of data points in test sets of WMT15 and WMT16 add to 5360. Hence, the issue is resolved, thus closing the same,