Closed ustctf-zz closed 6 years ago
Yes, you are right. Originally I used the Google dataset [1], but was hoping to reproduce the results with our script, because it's not clear how the Google version was preprocessed.
I'm working on an updated preprocessing script that should better match the Google version (~4.5M pairs). I'll post it here and update the README shortly.
Please try this dataset: https://github.com/pytorch/fairseq/pull/203
I just ran it on 128 GPUs and get the same results as (actually a little better than) the paper now.
Thanks @myleott !
I'm running on the new dataset (with 8 GPUs), and will return to you with latest result.
Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help!
BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks!
For En-Fr you can use the transformer_vaswani_wmt_en_fr_big
architecture. It's nearly identical to the En-De architecture except that we use a smaller dropout value: https://github.com/pytorch/fairseq/blob/f26b6affdaf67d271e0d39f4c4c8384c4e8160d9/fairseq/models/transformer.py#L467-L470
I used the standard fairseq En-Fr dataset with 40k BPE tokens, available here: https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh. For preprocessing make sure to add the --joined-dictionary
flag
Thanks!
Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help!
BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks!
Hi @myleott @ustctf, if I use the new processed WMT14 En-De data provided by Google, should I also do some postprocessing (like get_ende_bleu.sh in tensor2tensor) to get a good BLEU?
hi @ustctf Can you provide the BLEU score for en-fr by using this script https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh if you have used base transformer too please provide the scores. thanks.
@gvskalyan Sorry I've no records. Maybe you can ask for the official help.
@gvskalyan Sorry I've no records. Maybe you can ask for the official help.
Yeah, Thank You.
Hi,
Thank you for providing such an impressive toolkit!
For replicating the WMT14 En-De translation result, I follow the instructions here , but after running on 8 M40 for 5.5 days, the test set BLEU (<27) cannot match the one stated in the paper , or even the original T2T paper (28.4). May I know what's wrong at my side? Here is the running script:
(I do not use --fp16 and slightly enlarge the batch size from 3584 to 4096)
Here is the test script:
python generate.py ${REMOTE_DATA_PATH}/wmt14_en_de_joined_dict --path ${REMOTE_MODEL_PATH}/${model}/${PROBLEM}/${SETTING}/checkpoint_best.pt --batch-size 128 --beam 4 --lenpen 0.6 --quiet --remove-bpe --no-progress-bar
It outputs (after training for 5.5 days): Generate test with beam=4: BLEU4 = 26.66, 57.9/32.3/20.4/13.2 (BP=1.000, ratio=1.013, syslen=66179, reflen=65346)
BTW, it seems the dataset generated using prepare-wmt14en2de.sh has < 4M training pairs, not matching 4.5M, is it a possible reason?
Thanks a lot.