facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
29.85k stars 6.32k forks source link

Finetuning m2m Multilingual Model #3343

Open nikhiljaiswal opened 3 years ago

nikhiljaiswal commented 3 years ago

Hi,

I want to finetune the m2m model on my dataset which contains en and de in the source and its corresponding de and en in the target language. In other words, I want to perform joint training of en-de and de-en. when I try to finetune what parameters do I need to pass especially for the task an arch? I tried the following but the architecture do not match-

fairseq-train $path_2_data \ --finetune-from-model $pretrained_model \ --encoder-normalize-before --decoder-normalize-before \ --arch transformer --layernorm-embedding \ --task translation_multi_simple_epoch \ --sampling-method "temperature" \ --sampling-temperature 1.5 \ --encoder-langtok "src" \ --decoder-langtok \ --lang-pairs "$lang_pairs" \ --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \ --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \ --lr-scheduler inverse_sqrt --lr 3e-05 --warmup-updates 2500 --max-update 40000 \ --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \ --max-tokens 1024 --update-freq 2 \ --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \ --seed 222 --log-format simple --log-interval 2

jaspock commented 3 years ago

This answer may help you.

nikhiljaiswal commented 3 years ago

Hi @jaspock , thanks for the response. I went through that answer and tried with transformer_wmt_en_de_big, still I am getting the error as architecture do not match. Please help.

I tried using following -

fairseq-train $path_2_data --finetune-from-model $pretrained_model --max-epoch 500 --ddp-backend=legacy_ddp --task translation_multi_simple_epoch --lang-pairs de-en,en-de --arch transformer_wmt_en_de_big --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr '1e-07' --label-smoothing 0.1 --criterion label_smoothed_cross_entropy --dropout 0.3 --weight-decay 0.0001 --max-tokens 4000 --update-freq 8

jaspock commented 3 years ago

See my new answer.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!