MIT-REALM / neural_clbf

Toolkit for learning controllers based on robust control Lyapunov barrier functions
BSD 3-Clause "New" or "Revised" License
124 stars 43 forks source link

Strange Inplace Error Occurs Before Beginning First Epoch #11

Closed kwesiRutledge closed 1 year ago

kwesiRutledge commented 1 year ago

(This issue is described partially in Pull Request #10 .)

Steps To Reproduce

The following steps were executed on MacOS 13.0.1 (22A400). Note that, following Apple's directions, the Conda version that was used in this installation is Miniconda (Inspired by this tutorial.)

Attempt to follow the installation steps provided in the project README. These instructions failed. To successfully complete the step pip install -r requirements.txt, I needed to:

  1. Install casadi via conda: conda install casadi and then
  2. Install a lower version of diffcp (A version that worked for me was 1.0.18) : pip install diffcp==1.0.18.

Once everything is installed, change your directory to neural_clbf > neural_clbf > training. Run the script train_inverted_pendulum.py. Before the first Epoch starts, you will receive the following error and the script will terminate.

Error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Epoch 0:   0%|          | 0/157 [00:02<?, ?it/s]

Full Stacktrace

[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/autograd/__init__.py:173: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error:
  File "[MyHomeDirectory]/Documents/Development/neural_clbf/neural_clbf/training/train_sticking_pusher_slider.py", line 132, in <module>
    main(args)
  File "[MyHomeDirectory]/Documents/Development/neural_clbf/neural_clbf/training/train_sticking_pusher_slider.py", line 124, in main
    trainer.fit(clbf_controller)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 489, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 728, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 424, in optimizer_step
    model_ref.optimizer_step(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/optim/optimizer.py", line 113, in wrapper
    return func(*args, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/optim/sgd.py", line 125, in step
    loss = closure()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 722, in train_step_and_backward_closure
    result = self.training_step_and_backward(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 813, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 280, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "[MyHomeDirectory]/Documents/Development/neural_clbf/neural_clbf/controllers/neural_clbf_controller.py", line 439, in training_step
    self.descent_loss(x, goal_mask, safe_mask, unsafe_mask, requires_grad=True)
  File "[MyHomeDirectory]/Documents/Development/neural_clbf/neural_clbf/controllers/neural_clbf_controller.py", line 379, in descent_loss
    violation = F.relu(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/nn/functional.py", line 1457, in relu
    result = torch.relu(input)
 (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "[MyHomeDirectory]/Documents/Development/neural_clbf/neural_clbf/training/train_sticking_pusher_slider.py", line 132, in <module>
    main(args)
  File "[MyHomeDirectory]/Documents/Development/neural_clbf/neural_clbf/training/train_sticking_pusher_slider.py", line 124, in main
    trainer.fit(clbf_controller)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 489, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 728, in run_training_batch
    self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 424, in optimizer_step
    model_ref.optimizer_step(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
    self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
    trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
    self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
    self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
    optimizer.step(closure=lambda_closure, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/optim/optimizer.py", line 113, in wrapper
    return func(*args, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/optim/sgd.py", line 125, in step
    loss = closure()
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 722, in train_step_and_backward_closure
    result = self.training_step_and_backward(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 826, in training_step_and_backward
    self.backward(result, optimizer, opt_idx)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 859, in backward
    result.closure_loss = self.trainer.accelerator.backward(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 308, in backward
    output = self.precision_plugin.backward(
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 79, in backward
    model.backward(closure_loss, optimizer, opt_idx)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1275, in backward
    loss.backward(*args, **kwargs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "[MyHomeDirectory]/miniconda3/envs/neural_clbf3/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Epoch 0:   0%|          | 0/157 [00:01<?, ?it/s]
dawsonc commented 1 year ago

FWIW I was able to reproduce this on my machine by upgrading from torch==1.9.1 to torch==1.13.1.

kwesiRutledge commented 1 year ago

Just stumbled upon this while cleaning out old emails.

My apologies, if this was an error that occurred due to me accidentally upgrading torch. I thought that I executed my instructions from a fresh install, but it looks like I had torch==1.12.1 in the requirements.txt for some reason. I'll close this issue now.