facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

Reproduce results from mBART paper on IWSLT15 En-Vi dataset #3410

Open longdct opened 3 years ago

longdct commented 3 years ago

Reproduce results from mBART paper on IWSLT15 En-Vi dataset

What is your question?

I try to follow the mBART example and finetune on IWSLT15 En-Vi dataset to reproduce the result from the paper. However, after running multiple times with different seed, I only obtained about 34.1 BLEU for En->Vi and 34.6 BLEU for Vi->En, compared with 36.1 and 35.4 from the paper.

Code

# Download data
wget https://raw.githubusercontent.com/tensorflow/nmt/master/nmt/scripts/download_iwslt15.sh
sh download_iwslt15.sh
# Apply sentencepiece
spm_encode --model=$MODEL < $DATA/$TRAIN.$SRC > $DATA/$TRAIN.spm.$SRC & \
spm_encode --model=$MODEL < $DATA/$TRAIN.$TGT > $DATA/$TRAIN.spm.$TGT & \
spm_encode --model=$MODEL < $DATA/$VALID.$SRC > $DATA/$VALID.spm.$SRC & \
spm_encode --model=$MODEL < $DATA/$VALID.$TGT > $DATA/$VALID.spm.$TGT & \
spm_encode --model=$MODEL < $DATA/$TEST.$SRC > $DATA/$TEST.spm.$SRC & \
spm_encode --model=$MODEL < $DATA/$TEST.$TGT > $DATA/$TEST.spm.$TGT &
fairseq-preprocess \
  --source-lang $SRC \
  --target-lang $TGT \
  --trainpref $DATA/$TRAIN.spm \
  --validpref $DATA/$VALID.spm \
  --testpref $DATA/$TEST.spm \
  --destdir $DEST/$NAME \
  --thresholdtgt 0 \
  --thresholdsrc 0 \
  --workers 20 \
  --srcdict $DICT \
  --tgtdict $DICT
fairseq-train $DEST/$NAME \
  --encoder-normalize-before --decoder-normalize-before \
  --arch mbart_large --layernorm-embedding \
  --task translation_from_pretrained_bart \
  --source-lang $SRC --target-lang $TGT \
  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
  --lr-scheduler polynomial_decay --lr 3e-05 --warmup-updates 2500 --total-num-update 40000 \
  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
  --max-tokens 1024 --update-freq 2 \
  --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \
  --seed 222 --log-format simple --log-interval 2 \
  --restore-file $PRETRAIN \
  --reset-optimizer --reset-meters --reset-dataloader --reset-lr-scheduler \
  --langs $langs \
  --ddp-backend legacy_ddp

Since using fairseq-generate from the example with output hypothesis sentence without space (issue #3103), I log the output of fairseq-generate and extract sentences on line S, T and D to calculate BLEU using sacrebleu

What's your environment?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!