facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.43k stars 6.4k forks source link

Generating with MBART50 not working #4538

Open MathieuGrosso opened 2 years ago

MathieuGrosso commented 2 years ago

🐛 Bug

Hello i have downloaded the many to many mbart50 and i want to test it in en-fr with data from wmt. It did not work and I keep having the same word generated instead of a good translation. Do you know why? Are the model not pretrained ? Maybe i have not understand it right.

Here is a file showing what I get:

image001

To Reproduce

What i did:

  1. First downloaded with sacrebleu dataset wmt en-fr

  2. Python /path/to/fairseq/examples/multilingual/data_scripts/binarize.py

  3. Export path_2_data=$work_dir/databin Export model=$work_dir/model.pt Export langs="ar_AR,....,sl_SI" Export source_lang="en_XX" Export target_lang="fr_XX"

Fairseq-generate $path_2_data \ --path $model \ --task translation_from_pretrained_bart \ --gen-subset test \ -s en_XX -t fr_XX \ --sacrebleu --remove-bpe 'sentencepiece' \ --batch-size 32 \ --encoder-langtok "src" \ --decoder-langtok \ --langs $langs

Environment

AlexNLP commented 1 year ago

How did you solve this PR in the end?