bradyz / cross_view_transformers

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)
MIT License
531 stars 81 forks source link

Error running training #4

Closed DerrickXuNu closed 2 years ago

DerrickXuNu commented 2 years ago

Hi, when I try to run the follow command to train, an error throws out.

python scripts/train.py   data=nuscenes +experiment=cvt_nuscenes_vehicle   data.dataset_dir=data/nuscenes   data.labels_dir=data/cvt_labels_nuscenes   visualization=nuscenes_viz

Error:

/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:486: PossibleUserWarning: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.
  rank_zero_warn(
Epoch 0:   0%|                                                                                                                                                                                         | 0/8538 [00:00<?, ?it/s]Error executing job with overrides: ['data=nuscenes', '+experiment=cvt_nuscenes_vehicle', 'data.dataset_dir=/home/runshengxu/project/data/nuscenes', 'data.labels_dir=/home/runshengxu/project/data/cvt_labels_nuscenes', 'visualization=nuscenes_viz']
Traceback (most recent call last):
  File "scripts/train.py", line 71, in main
    trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
    self._call_and_handle_interrupt(
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
    return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
    results = self._run_stage()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
    return self._run_train()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
    self.fit_loop.run()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
    result = self._run_optimization(
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/optim/adamw.py", line 100, in step
    loss = closure()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
    closure_result = closure()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step
    return self.model(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/model/model_module.py", line 41, in training_step
    return self.shared_step(batch, 'train', True,
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/model/model_module.py", line 25, in shared_step
    loss, loss_details = self.loss_func(pred, batch)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 113, in forward
    outputs = {k: v(pred, batch) for k, v in self.items()}
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 113, in <dictcomp>
    outputs = {k: v(pred, batch) for k, v in self.items()}
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 50, in forward
    loss = super().forward(pred, label)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 24, in forward
    return sigmoid_focal_loss(pred, label, self.alpha, self.gamma, self.reduction)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/fvcore/nn/focal_loss.py", line 34, in sigmoid_focal_loss
    ce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction="none")
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/functional.py", line 3130, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([4, 12, 200, 200])) must be the same as input size (torch.Size([4, 1, 200, 200]))

Did I input the wrong command? I didn't change the config.yaml and I only have 1 gpu.

bradyz commented 2 years ago

I believe you need to remove the data=nuscenes in your command - the +experiment=... overrides this for you

DerrickXuNu commented 2 years ago

thanks a lot, that solve my problem. another question, what's the best way to switch from single gpu and multiple gpus?

bradyz commented 2 years ago

As long as you have multiple GPUs pytorch lightning will automatically use all GPUs that are visible via the env variable CUDA_VISIBLE_DEVICES