Provide wmt14_en_de model can only achieve 20.90 bleu

dmortem commented 4 years ago

Hi, when I used the checkpoint_best.pt provided in readme and the inference script "python generate_cmlm.py ${output_dir}/data-bin --path ${model_dir}/checkpoint_best.pt --task translation_self --remove-bpe --max-sentences 20 --decoding-iterations 10 --decoding-strategy mask_predict", I can only got the bleu of 20.90. What is the problem? Are there any other hyperparameters I need to modify in the inference script?

I see "average the 5 best checkpoints to create the final model" in the paper. So is the checkpoint_best.pt provided in the link the final model? If not, I wonder how to average the best checkpoints? Do we forward 5 models and average the prediction distribution?

Thank you!

dmortem commented 4 years ago

As for the data preprocessing, I used "bash prepare-wmt14en2de.sh" to download wmt14 dataset as mentioned in fairseq. Then I used your preprocess script in the ReadMe.md with the downloaded dictionary (32768 tokens). Is there any mistake for the data preprocessing step?

myaxxxxx commented 2 years ago

"average_model" can be found at fairseq/scripts module, called "average_checkpoints.py"

facebookresearch / Mask-Predict

Provide wmt14_en_de model can only achieve 20.90 bleu #15