The output of translate.py isn't as same as hypotheses file of train.py

Dolprimates commented 5 years ago

I'm trying the supMT of Chinese to Japanese, but the output of translate.py isn't as same as hypotheses file of train.py, not only the order.

(I'm using the version before the PKM layer was implemented, but the difference is only about fp16)

I realized that the traslate.py doesn't include the beam_hypotheses, so I edited the code a little bit following https://github.com/facebookresearch/XLM/blob/95d50abe1797d222a3953f6f2a12dfd05b3bc8d0/src/evaluation/evaluator.py#L385-L393 and the BLEU score improved a little bit, but still doesn't reach that of train.py.

I changed the code at https://github.com/facebookresearch/XLM/blob/0b193eb5240b4a61301179bd2683cf321c07806f/translate.py#L117

as follows

        if model_params.beam_size == 1:
            decoded, dec_lengths = decoder.generate(encoded, lengths.cuda(), params.tgt_id, max_len=int(1.5 * lengths.cuda().max().item() + 10))
        else:
            decoded, dec_lengths = decoder.generate_beam(
                encoded, lengths.cuda(), params.tgt_id, beam_size=model_params.beam_size,
                length_penalty=model_params.length_penalty,
                early_stopping=model_params.early_stopping,
                max_len=int(1.5 * lengths.cuda().max().item() + 10)
            )

Because I realized, at https://github.com/facebookresearch/XLM/blob/0b193eb5240b4a61301179bd2683cf321c07806f/translate.py#L115 lengths is length.cuda() so like above, I also changed from max_len=int(1.5 * lengths.max().item() + 10 to max_len=int(1.5 * lengths.cuda().max().item() + 10 but this didn't change the output.

glample commented 5 years ago

Hi,

When you reload the model and run an evaluation (--reload_model MODEL_PATH --eval_only 1 --debug), what BLEU score do you get? And how does that compare to the BLEU you have in your training log? Independently of the generated hypotheses, I would first check that the BLEU is matching without beam search.

Dolprimates commented 5 years ago

Without beam, the BLEU score of --reload_model MODEL_PATH --eval_only 1 was as same as that of training log.

Dolprimates commented 5 years ago

Without beam, the bleu score of "training log on train.py" and "reload the model and run an evaluation (--reload_model MODEL_PATH --eval_only 1 --debug) using train.py" is the same. But that is different from translate.py.

So, I dought that the translation process is different between evaluater.py and translate.py. I'm curious because this could be the possibility that one of, or both of the translation process (evaluater.py and translate.py) is wrong.

Tikquuss commented 4 years ago

I'm stuck with this mistake, help me get through it.

I am running the Unsupervised machine translation from a pretrained cross lingual langauge model in google colab. The language model is succesfully trained, while the MT training breaks with the following error:

Illegal division by zero at /content/XLM/src/evaluation/multi-bleu.perl line 154, line 10. WARNING - 04/22/20 10:51:57 - 0:02:45 - Impossible to parse BLEU score!

By looking at the hypothesis files (hyp0.?-?.valid.txt, hyp0.?-?.test.txt ... in dumped/unsupMT?-?/???????/hypotheses) which are supposed to contain the translations produced by the mt at that time, I noticed that they are all empty, while the reference files (ref.?-?.valid.txt, ref.?-?.test.txt) which are supposed to contain the target translations, are not empty.

This is consistent with the error because lines 153-154-155 of the multi-bleu.perl file contain these :

if ($length_translation<$length_reference) { $brevity_penalty = exp(1-$length_reference/$length_translation); }

So the question is why the hypothesis files are empty. I've done some digging in train.py and src/trainer.py with no luck.

facebookresearch / XLM

The output of translate.py isn't as same as hypotheses file of train.py #175