dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Apache License 2.0
1.36k stars 209 forks source link

AttributeError: 'LightningDistributedDataParallel' object has no attribute '_sync_params' #57

Open KimSoybean opened 2 years ago

KimSoybean commented 2 years ago

Validation sanity check: 0it [00:00, ?it/s]ERROR - ViLT - Failed after 0:00:13! Traceback (most recent call last): File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/pylib/Jupyter-kuplus/sacred/experiment.py", line 312, in run_commandline return self.run( File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/pylib/Jupyter-kuplus/sacred/experiment.py", line 276, in run run() File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/pylib/Jupyter-kuplus/sacred/run.py", line 238, in call self.result = self.main_function(args) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/pylib/Jupyter-kuplus/sacred/config/captured_function.py", line 42, in captured_function result = wrapped(args, kwargs) File "run.py", line 71, in main trainer.fit(model, datamodule=dm) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit results = self.accelerator_backend.train() File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train results = self.ddp_train(process_idx=self.task_idx, model=model) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 305, in ddp_train results = self.train_or_test() File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test results = self.trainer.train() File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 495, in train self.run_sanity_check(self.get_model()) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 693, in run_sanitycheck , eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 609, in run_evaluation output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step output = self.trainer.accelerator_backend.validation_step(args) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 161, in validation_step return self._step(args) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 170, in _step output = self.trainer.model(args) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/pylib/Jupyter-kuplus/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/pytorch-lightning/lib/python3.8/site-packages/pytorch_lightning/overrides/data_parallel.py", line 164, in forward self._sync_params() File "/home/zhurui10/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/pylib/Jupyter-kuplus/torch/nn/modules/module.py", line 1185, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'LightningDistributedDataParallel' object has no attribute '_sync_params'

During handling of the above exception, another exception occurred:

Leon-Francis commented 2 years ago

I faced the same issue, use PyTorch < 1.11.0 could solve that.