danijar / dreamerv3

Mastering Diverse Domains through World Models
https://danijar.com/dreamerv3
MIT License
1.28k stars 219 forks source link

[Question] Confused about training actor for continuous actions #75

Closed jadkins99 closed 5 months ago

jadkins99 commented 1 year ago

I am somewhat confused about how the gradients of the actor are obtained in the loss for continuous actions. Without a log probability term multiplying the advantage, where do the gradients come from? Thanks for the great work!

danijar commented 5 months ago

Gradients go through the logprob term and are scaled by the advantage, this is just normal PG.