facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

how to get higher evaluate wmt14 blue #4559

Closed tjshu closed 2 years ago

tjshu commented 2 years ago

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

as title,the question is the same as #4477

Code

Binarize the dataset

fairseq-preprocess \ --source-lang en --target-lang de \ --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \ --destdir data-bin/wmt14_en_de --thresholdtgt 0 --thresholdsrc 0 --nwordssrc 44000 --nwordstgt 44000\ --joined-dictionary --workers 20

PYTHONIOENCODING=utf-8 fairseq-train \ data-bin/wmt14_en_de \ --arch transformer_wmt_en_de --share-all-embeddings \ --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \ --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \ --dropout 0.3 --weight-decay 0.0001 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --max-tokens 4096 \ --max-tokens-valid 4096 \ --update-freq 1 \ --eval-bleu \ --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \ --eval-bleu-detok moses \ --eval-bleu-remove-bpe \ --eval-bleu-print-samples \ --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \ --save-dir checkpoints/wmt14_en_de/transformer/ckpt \ --log-format json \ --keep-last-epochs 5 \ --max-epoch 30 \ --fp16 \

Evaluate fairseq-generate data-bin/wmt14_en_de --path checkpoints/wmt14_en_de/transformer/ckpt/checkpoint_best.pt --batch-size 128 --beam 5 --remove-bpe --scoring sacrebleu

What have you tried?

add --scoring sacrebleu and change --nwordssrc 44000 --nwordstgt 44000 ->--nwordssrc 32768 --nwordstgt 32768 #3807 and try compound_split_bleu.sh But there are still huge differences between valid(27.21) and evaluate(24.37(add --scoring sacrebleu))

What's your environment?

tjshu commented 2 years ago

the question is in --update-freq 1 train in --update-freq 8 can get normal BLUE

BaohaoLiao commented 2 years ago

the question is in --update-freq 1 train in --update-freq 8 can get normal BLUE

May I ask why you use 44000 as the vocabulary size rather than 37000 in "attention is all you need"?

tjshu commented 2 years ago

the question is in --update-freq 1 train in --update-freq 8 can get normal BLUE

May I ask why you use 44000 as the vocabulary size rather than 37000 in "attention is all you need"? I try two vocabulary size,44000 better then 37000 a little 37000 vocabulary size also can achieve the BLUE of "attention is all you need"