Closed kxxwz closed 6 years ago
I want to know, in the original version, the author set the next_q_values.volatile=False
, and then do value_loss = criterion(q_batch, target_q_batch)
value_loss.backward()
So, in the original version, is he trying to do the backward() on both the target critic network and the critic network?
I want to know, in the original version, the author set the
next_q_values.volatile=False
, and then dovalue_loss = criterion(q_batch, target_q_batch)
value_loss.backward()
So, in the original version, is he trying to do the backward() on both the target critic network and the critic network?
I don't think the target networks will be updated by policy gradient.
volatile was removed and now has no effect. Use
with torch.no_grad():
instead.