MishaLaskin / rad

RAD: Reinforcement Learning with Augmented Data
400 stars 71 forks source link

Why is the encoder detached? #4

Closed miriaford closed 4 years ago

miriaford commented 4 years ago

I might have missed something simple, but could you please kindly explain why don't you update the encoder part?

https://github.com/MishaLaskin/rad/blob/master/curl_sac.py#L411-L413

In other SAC implementations (e.g. rlkit), the gradient back-props through the entire policy network. Thanks!

MishaLaskin commented 4 years ago

You still backprop from the critic, so the encoder gets gradients from Q value estimation. The only detached parts are gradients from the actor, which results in more stable policies.

MishaLaskin commented 4 years ago

afaik empirical

On Tue, May 5, 2020 at 2:48 PM Miria Ford notifications@github.com wrote:

Thanks! Is there any literature to back this up? Or is it purely empirical?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/MishaLaskin/rad/issues/4#issuecomment-624238698, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHWQWMGIBS6D5NYOAZGHATRQBNOFANCNFSM4MZHWSPQ .