facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.18k stars 6.37k forks source link

stats[args.best_checkpoint_metric] -> KeyError: 'bleu' #3881

Open Remorax opened 3 years ago

Remorax commented 3 years ago

🐛 Bug

Unable to use BLEU as a metric for saving best checkpoint, despite example translation scripts showing it can be used as one. A KeyError is obtained in my case.

There is also no documentation available on what are the valid values that can be used as --best-checkpoint-metric.

To Reproduce

Steps to reproduce the behavior:

  1. Run fairseq-train with best_checkpoint_metric
  2. See error
Traceback (most recent call last):
  File "/home/viyer/miniconda3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/viyer/fairseq/fairseq/distributed/utils.py", line 328, in distributed_main
    main(cfg, **kwargs)
  File "/home/viyer/fairseq/fairseq_cli/train.py", line 173, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/home/viyer/miniconda3/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/viyer/fairseq/fairseq_cli/train.py", line 298, in train
    valid_losses, should_stop = validate_and_save(
  File "/home/viyer/fairseq/fairseq_cli/train.py", line 385, in validate_and_save
    valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets)
  File "/home/viyer/fairseq/fairseq_cli/train.py", line 461, in validate
    valid_losses.append(stats[cfg.checkpoint.best_checkpoint_metric])
KeyError: 'bleu'

Code sample

This is the command I ran:


PRETRAIN=/home/viyer/ur-en/pretrained_models/mbart50.ft.n1/model.pt
lang_list=/home/viyer/ur-en/pretrained_models/mbart50.ft.n1/ML50_langs.txt
SAVEDIR=checkpoint
DATADIR=/home/viyer/ur-en/postprocessed
lang_pairs="en_XX-ur_PK"
SPM=/home/viyer/sentencepiece/build/src/spm_encode
DICT_TGT=/home/viyer/ur-en/pretrained_models/mbart50.ft.n1/dict.en_XX.txt
DICT_SRC=/home/viyer/ur-en/pretrained_models/mbart50.ft.n1/dict.ur_PK.txt
FAIRSEQ=/home/viyer/fairseq/fairseq_cli
DATA=/home/viyer/ur-en/data
DEST=/home/viyer/ur-en/postprocessed
TRAIN=train
VALID=dev
TEST=test
SRC=ur_PK
TGT=en_XX
NAME=ur-en-all

total_num_update=50000
lr=3e-05
warmup_updates=3000

fairseq-train ${DATADIR}/${NAME} \
  --finetune-from-model $PRETRAIN \
  --encoder-normalize-before --decoder-normalize-before \
  --arch mbart_large --layernorm-embedding \
  --task translation_multi_simple_epoch \
  --sampling-method "temperature" \
  --sampling-temperature 1.5 \
  --encoder-langtok "src" \
  --decoder-langtok \
  --lang-dict "$lang_list" \
  --lang-pairs "$lang_pairs" \
  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
  --lr-scheduler inverse_sqrt --lr 3e-05 --warmup-updates 500000 --max-update 50000 \
  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
  --max-tokens 1024 --update-freq 2 --batch-size 2 \
  --keep-interval-updates 10 --no-epoch-checkpoints \
  --seed 222 --log-format simple --log-interval 2 \
  --validate-interval-updates 100 \
  --best-checkpoint-metric loss --maximize-best-checkpoint-metric \
  --patience 10 --model-parallel-size 8 \
  --ddp-backend no_c10d --save-dir ${SAVEDIR}

Expected behavior

  1. BLEU should be usable as a validation metric.
  2. Documentation should be updated regarding what are the usable validation metrics.

Environment

XiaoqingNLP commented 2 years ago

@Remorax do you have solved this problem ?

skswldndi commented 2 years ago

I solved this by adding

--eval-bleu --eval-bleu-args --eval-bleu-detok --eval-bleu-remove-bpe

flag(for detail args, recommend to see fairseq/examples/translation). I think only '--best-checkpoint-metric' bleu flag doesn't work alone.

zhongmz commented 1 year ago

@Remorax have you solved this problem ?

XiaoqingNLP commented 1 year ago

you can try this --eval-bleu --eval-bleu-args --eval-bleu-detok --eval-bleu-remove-bpe

nikhiljaiswal commented 1 year ago

I am getting fairseq-train: error: unrecognized arguments: --eval-bleu --eval-bleu-args --eval-bleu-detok --eval-bleu-remove-bpe

Anyone can suggest how to solve this?

nikhiljaiswal commented 1 year ago

@XiaoqingNLP @Remorax @zhongmz @kimziwoo can you please help me in resolving my error. The details are mentioned above - https://github.com/facebookresearch/fairseq/issues/3881#issuecomment-1363920662

yugaljain1999 commented 1 year ago

@nikhiljaiswal Have u solve this issue yet? As I am also facing this issue

Wzhsgsg commented 1 year ago

@yugaljain1999 这个我捣鼓了几天,fairseq_的train.py有一个validate的函数trainer.validate(),每次处理一个batch的数据,所以我计算的是一个batch的f1-score,然后使用meter.log 录入到树里,meter.get_smooth_value,会将录入的求平均。 所以你需要改的是在你的task重写一个函数,task继承的fairseq_task,在fairseq里面有那个函数。首先在task里继承fairseq_task的valid_step()函数,每次batch会调用它一次,在里面计算你的指标,然后录入到树里:在task里重写一下这个函数 def reduce_metrics(self, logging_outputs, criterion): super().reduce_metrics(logging_outputs, criterion) ······· metrics.log_scalar('F1-score',f1_scores) 就可以了。如果要计算整个valid的指标,就需要在train.py哪里修改计算了

Wzhsgsg commented 1 year ago

@yugaljain1999 这个我捣鼓了几天,fairseq_的train.py有一个validate的函数trainer.validate(),每次处理一个batch的数据,所以我计算的是一个batch的f1-score,然后使用meter.log 录入到树里,meter.get_smooth_value,会将录入的求平均。 所以你需要改的是在你的task重写一个函数,task继承的fairseq_task,在fairseq里面有那个函数。首先在task里继承fairseq_task的valid_step()函数,每次batch会调用它一次,在里面计算你的指标,然后录入到树里:在task里重写一下这个函数 def reduce_metrics(self, logging_outputs, criterion): super().reduce_metrics(logging_outputs, criterion) ······· metrics.log_scalar('F1-score',f1_scores) 就可以了。如果要计算整个valid的指标,就需要在train.py哪里修改计算了

kurtabela commented 1 year ago

I'm facing the same issue.

@nikhiljaiswal in your case maybe the issue can be resolved by setting the task to be translation (--task translation) and maybe it could work. In my case I cannot do this as my task has to be --task translation_multi_simple_epoch (https://github.com/facebookresearch/fairseq/blob/main/examples/multilingual/README.md)