How to reproduce the result on WMT14 En-De

ustctf-zz commented 6 years ago

Hi,

Thank you for providing such an impressive toolkit!

For replicating the WMT14 En-De translation result, I follow the instructions here , but after running on 8 M40 for 5.5 days, the test set BLEU (<27) cannot match the one stated in the paper , or even the original T2T paper (28.4). May I know what's wrong at my side? Here is the running script:

model=transformer
PROBLEM=WMT14_ENDE
SETTING=transformer_vaswani_wmt_en_de_big

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py ${REMOTE_DATA_PATH}/wmt14_en_de_joined_dict \
--arch $SETTING --share-all-embeddings \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
  --lr 0.001 --min-lr 1e-09  --update-freq 16\
  --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --max-tokens 4096 --no-progress-bar --save-dir ${REMOTE_MODEL_PATH}/$model/$PROBLEM/$SETTING

(I do not use --fp16 and slightly enlarge the batch size from 3584 to 4096)

Here is the test script:

python generate.py ${REMOTE_DATA_PATH}/wmt14_en_de_joined_dict --path ${REMOTE_MODEL_PATH}/${model}/${PROBLEM}/${SETTING}/checkpoint_best.pt --batch-size 128 --beam 4 --lenpen 0.6 --quiet --remove-bpe --no-progress-bar

It outputs (after training for 5.5 days): Generate test with beam=4: BLEU4 = 26.66, 57.9/32.3/20.4/13.2 (BP=1.000, ratio=1.013, syslen=66179, reflen=65346)

BTW, it seems the dataset generated using prepare-wmt14en2de.sh has < 4M training pairs, not matching 4.5M, is it a possible reason?

Thanks a lot.

myleott commented 6 years ago

Yes, you are right. Originally I used the Google dataset [1], but was hoping to reproduce the results with our script, because it's not clear how the Google version was preprocessed.

I'm working on an updated preprocessing script that should better match the Google version (~4.5M pairs). I'll post it here and update the README shortly.

[1] https://github.com/tensorflow/tensor2tensor/blob/6a7ef7f79f56fdcb1b16ae76d7e61cb09033dc4f/tensor2tensor/data_generators/translate_ende.py#L60-L61

myleott commented 6 years ago

Please try this dataset: https://github.com/pytorch/fairseq/pull/203

I just ran it on 128 GPUs and get the same results as (actually a little better than) the paper now.

ustctf-zz commented 6 years ago

Thanks @myleott !

I'm running on the new dataset (with 8 GPUs), and will return to you with latest result.

ustctf-zz commented 6 years ago

Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help!

BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks!

myleott commented 6 years ago

For En-Fr you can use the transformer_vaswani_wmt_en_fr_big architecture. It's nearly identical to the En-De architecture except that we use a smaller dropout value: https://github.com/pytorch/fairseq/blob/f26b6affdaf67d271e0d39f4c4c8384c4e8160d9/fairseq/models/transformer.py#L467-L470

I used the standard fairseq En-Fr dataset with 40k BPE tokens, available here: https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh. For preprocessing make sure to add the --joined-dictionary flag

ustctf-zz commented 6 years ago

Thanks!

wangqiangneu commented 5 years ago

Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help!

BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks!

Hi @myleott @ustctf, if I use the new processed WMT14 En-De data provided by Google, should I also do some postprocessing (like get_ende_bleu.sh in tensor2tensor) to get a good BLEU?

kalyangvs commented 5 years ago

hi @ustctf Can you provide the BLEU score for en-fr by using this script https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh if you have used base transformer too please provide the scores. thanks.

ustctf-zz commented 5 years ago

@gvskalyan Sorry I've no records. Maybe you can ask for the official help.

kalyangvs commented 5 years ago

@gvskalyan Sorry I've no records. Maybe you can ask for the official help.

Yeah, Thank You.

facebookresearch / fairseq

How to reproduce the result on WMT14 En-De #202