Open hhh07 opened 9 months ago
I use follow hyperparameters run on iwslt14, but it seems performs bad. result show only bleu 26 on iwslt14, Does anyone know the appropriate hyperparameters for the iwslt dataset? thanks a lot!
`export CUDA_VISIBLE_DEVICES=1 dataset=data-bin/distill_iwslt14.tokenized.de-en save_dir=model/iwslt14-dslp log=${save_dir}/train.log max_update=250000 lr=0.0002 layers=5 dim=256
python3 train.py ${dataset} --source-lang de --target-lang en --save-dir ${save_dir} --eval-tokenized-bleu \ --task translation_lev --criterion nat_loss --arch nat_sd \ --maximize-best-checkpoint-metric \ --eval-bleu-remove-bpe --best-checkpoint-metric bleu --log-format simple --log-interval 100 \ --eval-bleu --eval-bleu-detok space --keep-last-epochs 3 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d \ --share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr ${lr} \ --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 4000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \ --fp16 --clip-norm 2.0 --max-update ${max_update} --noise full_mask \ --concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \ --activation-fn gelu --dropout 0.3 --max-tokens 8192 \ --length-loss-factor 0.1 --pred-length-offset \ --encoder-layers ${layers} --encoder-embed-dim ${dim} --decoder-layers ${layers} --decoder-embed-dim ${dim} --encoder-ffn-embed-dim 1024 | tee -a ${log}`
I use follow hyperparameters run on iwslt14, but it seems performs bad. result show only bleu 26 on iwslt14, Does anyone know the appropriate hyperparameters for the iwslt dataset? thanks a lot!
`export CUDA_VISIBLE_DEVICES=1 dataset=data-bin/distill_iwslt14.tokenized.de-en save_dir=model/iwslt14-dslp log=${save_dir}/train.log max_update=250000 lr=0.0002 layers=5 dim=256
python3 train.py ${dataset} --source-lang de --target-lang en --save-dir ${save_dir} --eval-tokenized-bleu \ --task translation_lev --criterion nat_loss --arch nat_sd \ --maximize-best-checkpoint-metric \ --eval-bleu-remove-bpe --best-checkpoint-metric bleu --log-format simple --log-interval 100 \ --eval-bleu --eval-bleu-detok space --keep-last-epochs 3 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d \ --share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr ${lr} \ --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 4000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \ --fp16 --clip-norm 2.0 --max-update ${max_update} --noise full_mask \ --concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \ --activation-fn gelu --dropout 0.3 --max-tokens 8192 \ --length-loss-factor 0.1 --pred-length-offset \ --encoder-layers ${layers} --encoder-embed-dim ${dim} --decoder-layers ${layers} --decoder-embed-dim ${dim} --encoder-ffn-embed-dim 1024 | tee -a ${log}`