hyperparameters for reproducing fconv model in zh-en

oww-file commented 4 years ago

Hi,

I want to reproduce the results of Fconv model in WMT17 Zh->En translation and model was trained with the following scripts:

fairseq-train \ data-bin/wmt17_zh_en \ --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \ --momentum 0.99 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --lr-scheduler fixed --force-anneal 50 \ --arch fconv_iwslt_de_en --save-dir checkpoints/fconv

The scripts used to test the performance of it after 45 epochs is

python generate.py \ data-bin/wmt17_zh_en/ --path checkpoints/fconv/checkpoint_best.pt \ --remove-bpe --beam 5 --lenpen 0.6

However, the result I got on valid set is

| Generate test with beam=10: BLEU4 = 9.44, 43.1/15.1/6.4/2.9 (BP=0.901, ratio=0.906, syslen=52932, reflen=58424)

and on test set is just 3.45

lematt1991 commented 4 years ago

Which example are you following? Can you provide a link to where you got these hyper-parameters? How did you pre-process the data?

lematt1991 commented 4 years ago

You can get the params used in the lightconv example by downloading the checkpoint and doing:

import torch
chkpnt = torch.load(<path to checkpoint>)
print(chkpnt['args'])

oww-file commented 4 years ago

@lematt1991
Thanks for your reply. The preprocessing I followed is https://github.com/twairball/fairseq-zh-en and I also used its hyper-parameters for reference. Meanwhile, as the mentioned example was for the torch version, I adjusted the scripts according to examples/translation/README.md WMT'14 English to German.

I'll try the params in the lightconv.

facebookresearch / fairseq

hyperparameters for reproducing fconv model in zh-en #1471