facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.37k stars 6.4k forks source link

hyperparameters for reproducing fconv model in zh-en #1471

Closed oww-file closed 4 years ago

oww-file commented 4 years ago

Hi,

I want to reproduce the results of Fconv model in WMT17 Zh->En translation and model was trained with the following scripts:

fairseq-train \ data-bin/wmt17_zh_en \ --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \ --momentum 0.99 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --lr-scheduler fixed --force-anneal 50 \ --arch fconv_iwslt_de_en --save-dir checkpoints/fconv

The scripts used to test the performance of it after 45 epochs is

python generate.py \ data-bin/wmt17_zh_en/ --path checkpoints/fconv/checkpoint_best.pt \ --remove-bpe --beam 5 --lenpen 0.6

However, the result I got on valid set is

| Generate test with beam=10: BLEU4 = 9.44, 43.1/15.1/6.4/2.9 (BP=0.901, ratio=0.906, syslen=52932, reflen=58424)

and on test set is just 3.45

training log: | epoch 012 | loss 5.667 | nll_loss 4.153 | ppl 17.79 | wps 54337 | ups 15 | wpb 3567.117 | bsz 131.646 | num_updates 20676 | lr 0.25 | gnorm 0.306 | clip 1.000 | oom 0.000 | wall 1469 | train_wall 1371 | epoch 012 | valid on 'valid' subset | loss 7.619 | nll_loss 6.404 | ppl 84.66 | num_updates 20676 | best_loss 7.61901 | epoch 025 | loss 5.164 | nll_loss 3.557 | ppl 11.77 | wps 54156 | ups 15 | wpb 3567.117 | bsz 131.646 | num_updates 43075 | lr 0.25 | gnorm 0.279 | clip 1.000 | oom 0.000 | wall 3044 | train_wall 2835 | epoch 025 | valid on 'valid' subset | loss 7.294 | nll_loss 5.995 | ppl 63.78 | num_updates 43075 | best_loss 7.29363 | epoch 045 | loss 4.848 | nll_loss 3.181 | ppl 9.07 | wps 46189 | ups 13 | wpb 3567.117 | bsz 131.646 | num_updates 77535 | lr 0.25 | gnorm 0.257 | clip 1.000 | oom 0.000 | wall 5472 | train_wall 5112 | epoch 045 | valid on 'valid' subset | loss 7.235 | nll_loss 5.895 | ppl 59.51 | num_updates 77535 | best_loss 7.23137

lematt1991 commented 4 years ago

Which example are you following? Can you provide a link to where you got these hyper-parameters? How did you pre-process the data?

lematt1991 commented 4 years ago

You can get the params used in the lightconv example by downloading the checkpoint and doing:

import torch
chkpnt = torch.load(<path to checkpoint>)
print(chkpnt['args'])
oww-file commented 4 years ago

@lematt1991
Thanks for your reply. The preprocessing I followed is https://github.com/twairball/fairseq-zh-en and I also used its hyper-parameters for reference. Meanwhile, as the mentioned example was for the torch version, I adjusted the scripts according to examples/translation/README.md WMT'14 English to German.

I'll try the params in the lightconv.