Closed MaksymDel closed 5 years ago
How do you use it for multiple target languages? The example only covers mulitple sources and one target language.
We added --decoder-langtok
supports in #620. You can specify --decoder-langtok
in both training and inference. It feeds the target language token as the first token to decoder.
We added
--decoder-langtok
supports in #620. You can specify--decoder-langtok
in both training and inference. It feeds the target language token as the first token to decoder.
@pipibjc can you please add an example of many-to-many multilingual translation case, right now the example only covers many-to-one scenario.
@madaanpulkit sure, I have some draft example about how to train a many-to-many multilingual translation model, but I need to clean it up a bit. I will update the example page shortly.
A draft would work for the time being (pulkit.madaan@ymail.com). Thanks for the quick replies.
Here is an example that uses the binarized data from the multilingual example. Here I just demonstrate how to specify the command line correctly without tuning the hyper-parameters:
Training:
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/ --max-epoch 50 --ddp-backend=no_c10d --task multilingual_translation --arch multilingual_transformer_iwslt_de_en --share-decoders --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09' --warmup-updates 4000 --warmup-init-lr '1e-07' --label-smoothing 0.1 --criterion label_smoothed_cross_entropy --dropout 0.3 --weight-decay 0.0001 --save-dir checkpoints/multilingual_transformer --max-tokens 4000 --update-freq 8 --max-update 20 --log-format json --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt
Inference:
CUDA_VISIBLE_DEVICES=0 python generate.py data-bin/iwslt17.de_fr.en.bpe16k/ --task multilingual_translation --path checkpoints/multilingual_transformer/checkpoint_best.pt --source-lang en --target-lang fr --gen-subset valid --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt
Here is an example that uses the binarized data from the multilingual example. Here I just demonstrate how to specify the command line correctly without tuning the hyper-parameters:
Training:
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/ --max-epoch 50 --ddp-backend=no_c10d --task multilingual_translation --arch multilingual_transformer_iwslt_de_en --share-decoders --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09' --warmup-updates 4000 --warmup-init-lr '1e-07' --label-smoothing 0.1 --criterion label_smoothed_cross_entropy --dropout 0.3 --weight-decay 0.0001 --save-dir checkpoints/multilingual_transformer --max-tokens 4000 --update-freq 8 --max-update 20 --log-format json --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt
Inference:
CUDA_VISIBLE_DEVICES=0 python generate.py data-bin/iwslt17.de_fr.en.bpe16k/ --task multilingual_translation --path checkpoints/multilingual_transformer/checkpoint_best.pt --source-lang en --target-lang fr --gen-subset valid --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt
@pipibjc thanks for the help.
Any particular reason behind using --encoder-langtok
and not --decoder-langtok
?
I have experimented both --encoder-langtok tgt
and --docoder-langtok
on many-to-many case, but I didn't find any difference. I use --encoder-langtok tgt
as example is just because the original paper suggested to do so.
I have experimented both
--encoder-langtok tgt
and--docoder-langtok
on many-to-many case, but I didn't find any difference. I use--encoder-langtok tgt
as example is just because the original paper suggested to do so.
I tried and --encoder-langtok tgt
worked better for me.
Hello @pipibjc
As the example is for many-to-one, it is intuitive to fix the --tgt-dict
during the preprocessing. However, each language pair is both a possible src
and tgt
candidates for a many-to-many scenario. So, how is the --shared-decoder
enabled in this pretext?
Hi!
If we share decoder parameters in the multilingual transformer, we need to tell shared decoder in which language to decode.
It might be done by (embedding and) passing target language id directly to the decoder.
Alternatively, one might need to append this language tag to actual sentence (so that it will be e.g. first word in the sentence).
How is it done in fairseq's multilingual transformer?
Thank you, Maksym