facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.63k stars 6.42k forks source link

lr_scheduler in train.py returning error #4064

Open stellalisy opened 2 years ago

stellalisy commented 2 years ago

lr-sheduler argument reporting errors when fine-tuning mBART

So I'm trying to follow a tutorial to fine tune a pre-trianed model by loading it and then using train.py, below is the command line arguments I used:

What is your question?

Code

python /content/fairseq/fairseq_cli/train.py /content/postprocessed/en-ja \ --encoder-normalize-before \ --decoder-normalize-before \ --arch mbart_large \ --task translation_from_pretrained_bart \ --source-lang en_XX \ --target-lang ja_XX \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.2 \ --dataset-impl mmap \ --optimizer adam \ --adam-eps 1e-06 \ --adam-betas '(0.9, 0.98)' \ --lr-scheduler polynomial_decay \ --lr 3e-05 \ --warmup-updates 2500 \ --max-update 40000 \ --dropout 0.3 \ --attention-dropout 0.1 \ --weight-decay 0.0 \ --max-tokens 768 \ --update-freq 2 \ --save-interval 1 \ --save-interval-updates 8000 \ --keep-interval-updates 10 \ --no-epoch-checkpoints \ --seed 222 \ --log-format simple \ --log-interval 2 \ --reset-optimizer \ --reset-meters \ --reset-dataloader \ --reset-lr-scheduler \ --restore-file /content/mbart.cc25.v2/model.pt \ --langs ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN \ --layernorm-embedding \ --ddp-backend no_c10d \ --save-dir checkpoint

At first I also had the argument "--min-lr -1" but I was getting errors saying it's not a commen line argument so I got rid of it, then I got the following error.

`2021-12-07 06:48:47 | ERROR | fairseq.dataclass.utils | Error when composing. Overrides: ['common.no_progress_bar=False', 'common.log_interval=2', "common.log_format='simple'", 'common.log_file=null', 'common.tensorboard_logdir=null', 'common.wandb_project=null', 'common.azureml_logging=False', 'common.seed=222', 'common.cpu=False', 'common.tpu=False', 'common.bf16=False', 'common.memory_efficient_bf16=False', 'common.fp16=False', 'common.memory_efficient_fp16=False', 'common.fp16_no_flatten_grads=False', 'common.fp16_init_scale=128', 'common.fp16_scale_window=null', 'common.fp16_scale_tolerance=0.0', 'common.on_cpu_convert_precision=False', 'common.min_loss_scale=0.0001', 'common.threshold_loss_scale=null', 'common.amp=False', 'common.amp_batch_retries=2', 'common.amp_init_scale=128', 'common.amp_scale_window=null', 'common.user_dir=null', 'common.empty_cache_freq=0', 'common.all_gather_list_size=16384', 'common.model_parallel_size=1', 'common.quantization_config_path=null', 'common.profile=False', 'common.reset_logging=False', 'common.suppress_crashes=False', 'common.use_plasma_view=False', "common.plasma_path='/tmp/plasma'", 'common_eval.path=null', 'common_eval.post_process=null', 'common_eval.quiet=False', "common_eval.model_overrides='{}'", 'common_eval.results_path=null', 'distributed_training.distributed_world_size=1', 'distributed_training.distributed_num_procs=1', 'distributed_training.distributed_rank=0', "distributed_training.distributed_backend='nccl'", 'distributed_training.distributed_init_method=null', 'distributed_training.distributed_port=-1', 'distributed_training.device_id=0', 'distributed_training.distributed_no_spawn=False', "distributed_training.ddp_backend='no_c10d'", "distributed_training.ddp_comm_hook='none'", 'distributed_training.bucket_cap_mb=25', 'distributed_training.fix_batches_to_gpus=False', 'distributed_training.find_unused_parameters=False', 'distributed_training.gradient_as_bucket_view=False', 'distributed_training.fast_stat_sync=False', 'distributed_training.heartbeat_timeout=-1', 'distributed_training.broadcast_buffers=False', 'distributed_training.slowmo_momentum=null', "distributed_training.slowmo_base_algorithm='localsgd'", 'distributed_training.localsgd_frequency=3', 'distributed_training.nprocs_per_node=1', 'distributed_training.pipeline_model_parallel=False', 'distributed_training.pipeline_balance=null', 'distributed_training.pipeline_devices=null', 'distributed_training.pipeline_chunks=0', 'distributed_training.pipeline_encoder_balance=null', 'distributed_training.pipeline_encoder_devices=null', 'distributed_training.pipeline_decoder_balance=null', 'distributed_training.pipeline_decoder_devices=null', "distributed_training.pipeline_checkpoint='never'", "distributed_training.zero_sharding='none'", 'distributed_training.fp16=False', 'distributed_training.memory_efficient_fp16=False', 'distributed_training.tpu=False', 'distributed_training.no_reshard_after_forward=False', 'distributed_training.fp32_reduce_scatter=False', 'distributed_training.cpu_offload=False', 'distributed_training.use_sharded_state=False', 'distributed_training.not_fsdp_flatten_parameters=False', 'dataset.num_workers=1', 'dataset.skip_invalid_size_inputs_valid_test=False', 'dataset.max_tokens=768', 'dataset.batch_size=null', 'dataset.required_batch_size_multiple=8', 'dataset.required_seq_len_multiple=1', "dataset.dataset_impl='mmap'", 'dataset.data_buffer_size=10', "dataset.train_subset='train'", "dataset.valid_subset='valid'", 'dataset.combine_valid_subsets=null', 'dataset.ignore_unused_valid_subsets=False', 'dataset.validate_interval=1', 'dataset.validate_interval_updates=0', 'dataset.validate_after_updates=0', 'dataset.fixed_validation_seed=null', 'dataset.disable_validation=False', 'dataset.max_tokens_valid=768', 'dataset.batch_size_valid=null', 'dataset.max_valid_steps=null', 'dataset.curriculum=0', "dataset.gen_subset='test'", 'dataset.num_shards=1', 'dataset.shard_id=0', 'dataset.grouped_shuffling=False', 'dataset.update_epoch_batch_itr=False', 'dataset.update_ordered_indices_seed=False', 'optimization.max_epoch=0', 'optimization.max_update=40000', 'optimization.stop_time_hours=0.0', 'optimization.clip_norm=0.0', 'optimization.sentence_avg=False', 'optimization.update_freq=[2]', 'optimization.lr=[3e-05]', 'optimization.stop_min_lr=-1.0', 'optimization.use_bmuf=False', 'optimization.skip_remainder_batch=False', "checkpoint.save_dir='checkpoint'", "checkpoint.restore_file='/content/mbart.cc25.v2/model.pt'", 'checkpoint.finetune_from_model=null', 'checkpoint.reset_dataloader=True', 'checkpoint.reset_lr_scheduler=True', 'checkpoint.reset_meters=True', 'checkpoint.reset_optimizer=True', "checkpoint.optimizer_overrides='{}'", 'checkpoint.save_interval=1', 'checkpoint.save_interval_updates=8000', 'checkpoint.keep_interval_updates=10', 'checkpoint.keep_interval_updates_pattern=-1', 'checkpoint.keep_last_epochs=-1', 'checkpoint.keep_best_checkpoints=-1', 'checkpoint.no_save=False', 'checkpoint.no_epoch_checkpoints=True', 'checkpoint.no_last_checkpoints=False', 'checkpoint.no_save_optimizer_state=False', "checkpoint.best_checkpoint_metric='loss'", 'checkpoint.maximize_best_checkpoint_metric=False', 'checkpoint.patience=-1', "checkpoint.checkpoint_suffix=''", 'checkpoint.checkpoint_shard_count=1', 'checkpoint.load_checkpoint_on_all_dp_ranks=False', 'checkpoint.write_checkpoints_asynchronously=False', 'checkpoint.model_parallel_size=1', 'bmuf.block_lr=1.0', 'bmuf.block_momentum=0.875', 'bmuf.global_sync_iter=50', 'bmuf.warmup_iterations=500', 'bmuf.use_nbm=False', 'bmuf.average_sync=False', 'bmuf.distributed_world_size=1', 'generation.beam=5', 'generation.nbest=1', 'generation.max_len_a=0.0', 'generation.max_len_b=200', 'generation.min_len=1', 'generation.match_source_len=False', 'generation.unnormalized=False', 'generation.no_early_stop=False', 'generation.no_beamable_mm=False', 'generation.lenpen=1.0', 'generation.unkpen=0.0', 'generation.replace_unk=null', 'generation.sacrebleu=False', 'generation.score_reference=False', 'generation.prefix_size=0', 'generation.no_repeat_ngram_size=0', 'generation.sampling=False', 'generation.sampling_topk=-1', 'generation.sampling_topp=-1.0', 'generation.constraints=null', 'generation.temperature=1.0', 'generation.diverse_beam_groups=-1', 'generation.diverse_beam_strength=0.5', 'generation.diversity_rate=-1.0', 'generation.print_alignment=null', 'generation.print_step=False', 'generation.lm_path=null', 'generation.lm_weight=0.0', 'generation.iter_decode_eos_penalty=0.0', 'generation.iter_decode_max_iter=10', 'generation.iter_decode_force_max_iter=False', 'generation.iter_decode_with_beam=1', 'generation.iter_decode_with_external_reranker=False', 'generation.retain_iter_history=False', 'generation.retain_dropout=False', 'generation.retain_dropout_modules=null', 'generation.decoding_format=null', 'generation.no_seed_provided=False', 'eval_lm.output_word_probs=False', 'eval_lm.output_word_stats=False', 'eval_lm.context_window=0', 'eval_lm.softmax_batch=9223372036854775807', 'interactive.buffer_size=0', "interactive.input='-'", 'ema.store_ema=False', 'ema.ema_decay=0.9999', 'ema.ema_start_update=0', 'ema.ema_seed_model=null', 'ema.ema_update_freq=1', 'ema.ema_fp32=False', 'criterion=label_smoothed_cross_entropy', 'criterion._name=label_smoothed_cross_entropy', 'criterion.label_smoothing=0.2', 'criterion.report_accuracy=False', 'criterion.ignore_prefix_size=0', 'criterion.sentence_avg=False', 'optimizer=adam', 'optimizer._name=adam', "optimizer.adam_betas='(0.9, 0.98)'", 'optimizer.adam_eps=1e-06', 'optimizer.weight_decay=0.0', 'optimizer.use_old_adam=False', 'optimizer.fp16_adam_stats=False', 'optimizer.tpu=False', 'optimizer.lr=[3e-05]', 'lr_scheduler=polynomial_decay', 'lr_scheduler._name=polynomial_decay', 'lr_scheduler.warmup_updates=2500', 'lr_scheduler.force_anneal=null', 'lr_scheduler.end_learning_rate=0.0', 'lr_scheduler.power=1.0', 'lr_scheduler.total_num_update=null', 'lr_scheduler.lr=[3e-05]', 'scoring=bleu', 'scoring._name=bleu', 'scoring.pad=1', 'scoring.eos=2', 'scoring.unk=3'] Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 513, in _apply_overrides_to_config OmegaConf.update(cfg, key, value, merge=True) File "/usr/local/lib/python3.7/dist-packages/omegaconf/omegaconf.py", line 613, in update root.setattr(last_key, value) File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 285, in setattr raise e File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 282, in setattr self.__set_impl(key, value) File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 266, in __set_impl self._set_item_impl(key, value) File "/usr/local/lib/python3.7/dist-packages/omegaconf/basecontainer.py", line 398, in _set_item_impl self._validate_set(key, value) File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 143, in _validate_set self._validate_set_merge_impl(key, value, is_assign=True) File "/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py", line 159, in _validate_set_merge_impl cause=ValidationError("child '$FULL_KEY' is not Optional"), File "/usr/local/lib/python3.7/dist-packages/omegaconf/base.py", line 101, in _format_and_raise type_override=type_override, File "/usr/local/lib/python3.7/dist-packages/omegaconf/_utils.py", line 694, in format_and_raise _raise(ex, cause) File "/usr/local/lib/python3.7/dist-packages/omegaconf/_utils.py", line 610, in _raise raise ex # set end OC_CAUSE=1 for full backtrace omegaconf.errors.ValidationError: child 'lr_scheduler.total_num_update' is not Optional full_key: lr_scheduler.total_num_update reference_type=Optional[PolynomialDecayLRScheduleConfig] object_type=PolynomialDecayLRScheduleConfig

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/content/fairseq/fairseq_cli/train.py", line 535, in cli_main() File "/content/fairseq/fairseq_cli/train.py", line 515, in cli_main cfg = convert_namespace_to_omegaconf(args) File "/usr/local/lib/python3.7/dist-packages/fairseq/dataclass/utils.py", line 389, in convert_namespace_to_omegaconf composed_cfg = compose("config", overrides=overrides, strict=False) File "/usr/local/lib/python3.7/dist-packages/hydra/experimental/compose.py", line 37, in compose with_log_configuration=False, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.py", line 512, in compose_config from_shell=from_shell, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 156, in load_configuration from_shell=from_shell, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 277, in _load_configuration ConfigLoaderImpl._apply_overrides_to_config(config_overrides, cfg) File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 522, in _apply_overrides_to_config ) from ex hydra.errors.ConfigCompositionException: Error merging override lr_scheduler.total_num_update=null`

I also tried pip install --upgrade omegaconf, but it doesn't work either.

What's your environment?

Ch3nYe commented 2 years ago

I fix it by change --lr-scheduler polynomial_decay to --lr-scheduler inverse_sqrt . maybe polynomial_decay cannot work well.

QizhiPei commented 2 years ago

You need to add '--total-num-update' argument in your script, which is needed in polynomial_decay lr scheduler. Note that different lr schedulers accept different arguments.