facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

Error in reproducing "Training a New Model" #2830

Closed lancioni closed 3 years ago

lancioni commented 3 years ago

🐛 Bug

When reproducing steps described in "Training a New Model" in documentation (https://fairseq.readthedocs.io/en/latest/getting_started.html#training-a-new-model) training end with an error

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

mkdir -p checkpoints/fconv CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \ --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \ --arch fconv_iwslt_de_en --save-dir checkpoints/fconv

2020-10-31 21:21:20 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/fconv/checkpoint_last.pt 2020-10-31 21:21:20 | INFO | fairseq.trainer | loading train data for epoch 1 2020-10-31 21:21:20 | INFO | fairseq.data.data_utils | loaded 160239 examples from: data-bin/iwslt14.tokenized.de-en/train.de-en.de 2020-10-31 21:21:20 | INFO | fairseq.data.data_utils | loaded 160239 examples from: data-bin/iwslt14.tokenized.de-en/train.de-en.en 2020-10-31 21:21:20 | INFO | fairseq.tasks.translation | data-bin/iwslt14.tokenized.de-en train de-en 160239 examples Traceback (most recent call last): File "/usr/local/bin/fairseq-train", line 33, in sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/home/lancioni/fairseq/fairseq_cli/train.py", line 349, in cli_main distributed_utils.call_main(cfg, main) File "/home/lancioni/fairseq/fairseq/distributed_utils.py", line 317, in call_main main(cfg, *kwargs) File "/home/lancioni/fairseq/fairseq_cli/train.py", line 108, in main extra_state, epoch_itr = checkpoint_utils.load_checkpoint( File "/home/lancioni/fairseq/fairseq/checkpoint_utils.py", line 220, in load_checkpoint trainer.lr_step(epoch_itr.epoch) File "/home/lancioni/fairseq/fairseq/trainer.py", line 787, in lr_step self.lr_scheduler.step(epoch, val_loss) File "/home/lancioni/fairseq/fairseq/trainer.py", line 204, in lr_scheduler self._build_optimizer() # this will initialize self._lr_scheduler File "/home/lancioni/fairseq/fairseq/trainer.py", line 233, in _build_optimizer self._optimizer = optim.build_optimizer(self.cfg.optimizer, params) File "/home/lancioni/fairseq/fairseq/optim/init.py", line 41, in build_optimizer return _build_optimizer(cfg, params, extra_args, **extra_kwargs) File "/home/lancioni/fairseq/fairseq/registry.py", line 43, in build_x raise ValueError('{} is required!'.format(registry_name)) ValueError: optimizer is required!

Code sample

Expected behavior

I expect to perform training without errors!

Environment

Fairseq 1.0.0a0+de85969 PyTorch 1.7.0 Ubuntu 20.04 under WSL2 in windows 10 Dev (build 20246_fe) Fairseq installed from pip Python 3.8.5 CUDA toolkit 11.0 CUDA Driver (under Windows 10): 460.20_quadro_win10-dch_64bit_international GPU: NVIDIA Quadro M1200

Additional context

myleott commented 3 years ago

Ah right, --optimizer is a required option now. Please add --optimizer nag for that particular example. We will update the documentation.

broune commented 3 years ago

Maybe I'm holding it wrong, but the fix seems not to have been propagated to the online docs?

https://fairseq.readthedocs.io/en/latest/getting_started.html

chenllliang commented 3 years ago

the miss still exists in current doc +1

Valdegg commented 3 years ago

Still not updated :)

bohoro commented 3 years ago

The docs are updated.