ClownW / Reinforcement-learning-with-PyTorch

Reinforcement learning with PyTorch, inspired by MorvanZhou, change the framework from Tensorflow to PyTorch
242 stars 66 forks source link

RuntimeError about AC_CartPole.py #1

Open Coder-Liuu opened 2 years ago

Coder-Liuu commented 2 years ago

I didn't change anything about 8_Actor_Critic_Advantage/AC_CartPole.py. I just ran it, but I got this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gr
adient, with torch.autograd.set_detect_anomaly(True).

So, I add torch.autograd.set_detect_anomaly(True) to code, but I got this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its g
radient. The variable in question was changed in there or anywhere later. Good luck!

my pytorch version is 1.7.0. my numpy version is 1.18.5

Gera001 commented 2 years ago

请问pytorch环境是什么谢谢

NaturalShower commented 2 years ago

我在actor和critic的learn()上瞎改一通后能跑了

def learn(self, s, a, td): s = torch.Tensor(s[np.newaxis, :]) acts_prob = self.actor_net(s) log_prob = torch.log(acts_prob[0, a]) with torch.no_grad(): exp_v = torch.mean(log_prob * td loss = -exp_v torch.autograd.set_detect_anomaly(True) loss.requiresgrad(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return exp_v

def learn(self, s, r, s): s, s = torch.Tensor(s[np.newaxis, :]), torch.Tensor(s[np.newaxis, :]) v, v = self.critic_net(s), self.criticnet(s) with torch.no_grad(): tderror = r + GAMMA * v - v loss = td_error ** 2 loss.requiresgrad(True) torch.autograd.set_detect_anomaly(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return td_error

源码报错原因好像是因为critic的梯度传到actor中了或者是梯度没计算啥的...我也是小白,也没搞懂

ClownW commented 2 years ago

您的来信已收到,祝您生活愉快!

henbudidiao commented 2 years ago

我也出现了相同的问题,RuntimeError about AC_CartPole.py 通过NaturalShower老哥给出的方法虽然不报错了,代码可以正常运行了,但是代码无法收敛。

henbudidiao commented 2 years ago

https://zhuanlan.zhihu.com/p/511825440

ClownW commented 2 years ago

您的来信已收到,祝您生活愉快!

i-Qin commented 1 year ago

将105行改为:return td_error.detach() 别把梯度传过去

ClownW commented 1 year ago

您的来信已收到,祝您生活愉快!