Closed jadkins99 closed 5 months ago
I am somewhat confused about how the gradients of the actor are obtained in the loss for continuous actions. Without a log probability term multiplying the advantage, where do the gradients come from? Thanks for the great work!
Gradients go through the logprob term and are scaled by the advantage, this is just normal PG.
I am somewhat confused about how the gradients of the actor are obtained in the loss for continuous actions. Without a log probability term multiplying the advantage, where do the gradients come from? Thanks for the great work!