WMT17 dataset: Mismatch in candidate-reference pairs counts

google-research / bleurt

BLEURT is a metric for Natural Language Generation based on transfer learning.

Apache License 2.0

697 stars 85 forks source link

Hi, I was trying to download the WMT17 dataset using the wmt/db_builder example shared in Experiments with the WMT Metrics shared task section. However, I found that downloading the WMT17 dataset in this manner results only in 3920 candidate-reference pairs, while the number of candidate-reference pairs in WMT17 is mentioned to be 5344 in the research paper.

Thus, I wished to check what might be causing this discrepancy in the count of candidate-reference pairs.

PS: I also tried setting average_duplicates flag to False while calling wmt/db_builder but that resulted in 4132 samples. Still lower than 5344 samples

google-research / bleurt

WMT17 dataset: Mismatch in candidate-reference pairs counts #23