facebookresearch / XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.
Other
2.89k stars 498 forks source link

WARNING - Impossible to parse BLEU score! #290

Closed Tikquuss closed 4 years ago

Tikquuss commented 4 years ago

I am running the Unsupervised machine translation from a pretrained cross lingual langauge model in google colab. The language model is succesfully trained, while the MT training breaks with the following error:

Illegal division by zero at /content/XLM/src/evaluation/multi-bleu.perl line 154, line 10. WARNING - 04/22/20 10:51:57 - 0:02:45 - Impossible to parse BLEU score!

By looking at the hypothesis files (hyp0.?-?.valid.txt, hyp0.?-?.test.txt ... in dumped/unsupMT?-?/???????/hypotheses) which are supposed to contain the translations produced by the mt at that time, I noticed that they are all empty, while the reference files (ref.?-?.valid.txt, ref.?-?.test.txt) which are supposed to contain the target translations, are not empty.

This is consistent with the error because lines 153-154-155 of the multi-bleu.perl file contain these :

if ($length_translation<$length_reference) { $brevity_penalty = exp(1-$length_reference/$length_translation); }

So the question is why the hypothesis files are empty. I've done some digging in train.py and src/trainer.py with no luck.

Tikquuss commented 4 years ago

I've already solved my problem. The problem was that I chose a bad epoch_size. To make it simple I divided the number of examples by the batch_size. And I got excellent scores. I use my own data.