chenyangh / DSLP

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation
MIT License
43 stars 5 forks source link

The problem of training and generation scripts of CMLM #16

Open JasmineChen123 opened 1 year ago

JasmineChen123 commented 1 year ago

Hi, thank you for releasing the code! I have a question about the given bash scripts of training and inference.

The training scripts of the CMLM+DSLP python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de --save-dir checkpoints --eval-tokenized-bleu \ --keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric \ --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100 \ --eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d \ --share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \ --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \ --fp16 --clip-norm 2.0 --max-update 300000 --task translation_lev --criterion nat_loss --arch glat_sd --noise full_mask \ --concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \ --activation-fn gelu --dropout 0.1 --max-tokens 8192 \ --length-loss-factor 0.1 --pred-length-offset

The "--arch glat_sd" is weird. Is it "cmlm_sd" or "cmlm_transformer"?

Another question is, could you please give us the generation scripts for CMLM (iter>1), when setting "--iter-decode-max-iter 5/10"? I find that the BLEU under iter=5/10 is much worse than that of iter=1.

cecilialeo77 commented 9 months ago

Hello I have the same question, which of the three should be used in the training script of CMLM+DSLP, cmlm_sd, cmlm_sd_ss, cmlm_transformer? Is it clear to you now? Thanks!

chenyangh commented 9 months ago

@cecilialeo77 Hi, CMLM+DSLP should be cmlm_sd. cmlm_sd_ss is DSLP + Mixed Training

chenyangh commented 9 months ago

@JasmineChen123 Hi. Yes you are right, it should be cmlm_sd. My mistake on the script. Will fix it.