alexis-jacq / Pytorch-DPPO

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286
MIT License
179 stars 40 forks source link

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead. #9

Open TJ2333 opened 3 years ago

TJ2333 commented 3 years ago

env: torch 1.8.1+cu111

Error: UserWarning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "", line 1, in File "E:\A\envs\gym\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "E:\A\envs\gym\lib\multiprocessing\spawn.py", line 118, in _main return self._bootstrap() File "E:\A\envs\gym\lib\multiprocessing\process.py", line 297, in _bootstrap self.run() File "E:\A\envs\gym\lib\multiprocessing\process.py", line 99, in run self._target(*self._args, self._kwargs) File "Pytorch-RL\Pytorch-DPPO-master\train.py", line 155, in train mu_old, sigma_sq_old, v_pred_old = model_old(batch_states) File "E:\A\envs\gym\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "Pytorch-DPPO-master\model.py", line 53, in forward v1 = self.v(x3) File "E:\A\envs\gym\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "E:\A\envs\gym\lib\site-packages\torch\nn\modules\linear.py", line 94, in forward return F.linear(input, self.weight, self.bias) File "E:\A\envs\gym\lib\site-packages\torch\nn\functional.py", line 1753, in linear return torch._C._nn.linear(input, weight, bias) (Triggered internally at ..\torch\csrc\autograd\python_anomaly_mode.cpp:104.) allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag Process Process-4: Traceback (most recent call last): File "E:\A\envs\gym\lib\multiprocessing\process.py", line 297, in _bootstrap self.run() File "E:\A\envs\gym\lib\multiprocessing\process.py", line 99, in run self._target(*self._args, **self._kwargs) File "Pytorch-DPPO-master\train.py", line 197, in train total_loss.backward(retain_graph=True) File "E:\A\envs\gym\lib\site-packages\torch\tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "E:\A\envs\gym\lib\site-packages\torch\autograd__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

i googled and some says its caused by inplcae op ,but i cant seems to find any,i havent try to downgrade torch version,but is there any solutions that i dont need to downgrade ?

xiaomeng9532 commented 2 years ago

Hello, have you solved this problem?

TJ2333 commented 2 years ago

Hello, have you solved this problem?

Nope,but found an other version of dppo code ,https://github.com/TianhongDai/distributed-ppo