Very high BLEU score for ta-en

rbawden commented 4 years ago

Hello! I noticed that the BLEU score for ta-en is 89.1, which seems a little too high. Could this be a bug? Also, were the BLEU scores calculated on the de-tokenised outputs or the BPE-ed ones in opus-2019-12-05.test.txt?

Thank you in advance!

jorgtied commented 4 years ago

Yes, this is too high and the problem is that I used some localisation data for testing in early models. Tatoeba as the name is misleading and a mistake. I am pretty sure that those test sets are taken from GNOME and this overlaps with data from the Ubuntu localisation files, which are included in the training data. Therefore, the high scores in this case. There will be similar cases for other language pairs. Sorry. But the scores are definitely after merging subword units but in this case they might still come from tokenised text using multibleu. Now, I always run on detokenized results and sacrebleu. Or better, the new models apply SentencePiece and no tokenisation. Hope this explains the situation.

rbawden commented 4 years ago

Ok, thank you for the explanation!

Helsinki-NLP / OPUS-MT-train

Very high BLEU score for ta-en #1