ghliu / pytorch-ddpg

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch
Apache License 2.0
569 stars 157 forks source link

a little change in ddpg.py #4

Closed kxxwz closed 6 years ago

kxxwz commented 6 years ago

volatile was removed and now has no effect. Use with torch.no_grad(): instead.

songanz commented 6 years ago

I want to know, in the original version, the author set the next_q_values.volatile=False, and then do value_loss = criterion(q_batch, target_q_batch) value_loss.backward() So, in the original version, is he trying to do the backward() on both the target critic network and the critic network?

kxxwz commented 6 years ago

I want to know, in the original version, the author set the next_q_values.volatile=False, and then do value_loss = criterion(q_batch, target_q_batch) value_loss.backward() So, in the original version, is he trying to do the backward() on both the target critic network and the critic network?

I don't think the target networks will be updated by policy gradient.