Why is the encoder detached?

miriaford commented 4 years ago

I might have missed something simple, but could you please kindly explain why don't you update the encoder part?

https://github.com/denisyarats/drq/blob/master/drq.py#L263-L264

In other SAC implementations (e.g. rlkit), the gradient back-props through the entire policy network. Thanks!

denisyarats commented 4 years ago

We based our implementation on SAC-AE (Yarats et al., 2019), where it's been shown to be beneficial for stable training, we drew the inspiration from the DeepMind Control Suite white paper (Tassa et al., 2018).

Note, that recent papers like CURL and RAD are built on our code from SAC-AE, so they just use the exact same trick.

We believe that this trick is crucial for SOTA performance and we have strong empirical evidence for this.

Here is our understanding on why this is important:

In SAC a policy is just a parametric projection of the Boltzamm distribution induced by the Q-function. Thus, we are not really losing any information by not propagating gradients from the policy down to the conv layers. Please note, that we also share the conv layers between the actor and critic: https://github.com/denisyarats/drq/blob/master/drq.py#L186, this means that only the critic can update the convnet.
In order to train the policy we have to employ the reparametrization trick, this estimator tends to have larger variance of gradients, which could lead to training instabilities. Thus preventing these gradient from updating the conv layers improves training stability considerably.

miriaford commented 4 years ago

This is by far the best response I get on Github from any paper author. Thank you so much for the explanation!

ZhaomingXie commented 4 years ago

Hi ,thanks for the insights! It makes sense to me to freeze the encoder for actor update to ensure learning stability.

However, I don't understand why not freeze the final linear layer of the encoder as well? Do you have any comment on this? Thanks!

denisyarats / drq

Why is the encoder detached? #2