ghliu / pytorch-ddpg

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch
Apache License 2.0
569 stars 157 forks source link

the gradient of the action-value with respect to actions #7

Open Joywanglulu opened 5 years ago

Joywanglulu commented 5 years ago

Hi, I'm not sure if it would calculate the gradient of the action-value with respect to actions?

policy_loss = -self.critic([ to_tensor(state_batch), self.actor(to_tensor(state_batch)) ])

zhihanyang2022 commented 3 years ago

I think the answer is yes.

policy_loss = -self.critic([
            to_tensor(state_batch),
            self.actor(to_tensor(state_batch))
        ])

policy_loss = policy_loss.mean()
policy_loss.backward()
self.actor_optim.step()

First of all, I think it is clear that we are doing a gradient step using the actor's optimizer. I guess the question is more like: "can we propagate gradients to a previous network?" The answer to this is also yes, please refer to: https://discuss.pytorch.org/t/backprop-through-weights-of-a-second-network/52573/4.