facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

TypeError: __init__() got an unexpected keyword argument 'gradient_as_bucket_view' #3945

Open rattlesnakey opened 3 years ago

rattlesnakey commented 3 years ago

🐛 Bug

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/zhy2018/miniconda3/envs/fairseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/data/private/zhy2018/projects/abstract_transformer/fairseq/fairseq/distributed/utils.py", line 328, in distributed_main main(cfg, **kwargs) File "/data/private/zhy2018/projects/abstract_transformer/fairseq/fairseq_cli/train.py", line 155, in main extra_state, epoch_itr = checkpoint_utils.load_checkpoint( File "/data/private/zhy2018/projects/abstract_transformer/fairseq/fairseq/checkpoint_utils.py", line 272, in load_checkpoint epoch_itr = trainer.get_train_iterator( File "/data/private/zhy2018/projects/abstract_transformer/fairseq/fairseq/trainer.py", line 637, in get_train_iterator self.model.max_positions(), File "/data/private/zhy2018/projects/abstract_transformer/fairseq/fairseq/trainer.py", line 251, in model self._wrapped_model = models.DistributedFairseqModel( File "/data/private/zhy2018/projects/abstract_transformer/fairseq/fairseq/models/distributed_fairseq_model.py", line 58, in DistributedFairseqModel wrapped_model = DistributedDataParallel( TypeError: init() got an unexpected keyword argument 'gradient_as_bucket_view'

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

CUDA_VISIBLE_DEVICES=0,1 WANDB_NAME=test fairseq-train ${DATA_DIR} --arch transformer \ --source-lang src --target-lang tgt \ --optimizer adam --lr ${LR} --adam-betas '(0.9, 0.98)' \ --lr-scheduler inverse_sqrt --max-tokens 4096 --dropout 0.3 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --max-epoch 100 --batch-size ${BATCH_SIZE} --max-update 200000 --warmup-updates 4000 --warmup-init-lr '1e-07' \ --update-freq 2 --task translation \ --keep-last-epochs 5 --num-workers 8 \ --save-dir ${MODEL_DIR}/checkpoints \ --wandb-project summary-generation-transformer-fairseq \ --best-checkpoint-metric bleu --maximize-best-checkpoint-metric --seed 42 \ --patience 7

Code sample

Expected behavior

Environment

Additional context

rattlesnakey commented 3 years ago

when I use fairseq-train with multi-GPU, it will occur this error

Explorerhpx commented 3 years ago

The same error occurs for me when I use fairseq-train with mulit-GPU (single GPU is ok). And my environment is: fairseq Version (1.0.0a0+92cae45): PyTorch Version (1.6) How you installed fairseq (source): Python version:3.8.10 CUDA/cuDNN version:10.1 GPU models and configuration: GeForce RTX 2080 Ti

haruhi-sudo commented 2 years ago

I solve this problem by change the pytorch version to 1.10,maybe you can have a try.