denisyarats / drq

DrQ: Data regularized Q
https://sites.google.com/view/data-regularized-q
MIT License
407 stars 52 forks source link

Why is the encoder detached? #2

Closed miriaford closed 4 years ago

miriaford commented 4 years ago

I might have missed something simple, but could you please kindly explain why don't you update the encoder part?

https://github.com/denisyarats/drq/blob/master/drq.py#L263-L264

In other SAC implementations (e.g. rlkit), the gradient back-props through the entire policy network. Thanks!

denisyarats commented 4 years ago

We based our implementation on SAC-AE (Yarats et al., 2019), where it's been shown to be beneficial for stable training, we drew the inspiration from the DeepMind Control Suite white paper (Tassa et al., 2018).

Note, that recent papers like CURL and RAD are built on our code from SAC-AE, so they just use the exact same trick.

We believe that this trick is crucial for SOTA performance and we have strong empirical evidence for this.

Here is our understanding on why this is important:

miriaford commented 4 years ago

This is by far the best response I get on Github from any paper author. Thank you so much for the explanation!

ZhaomingXie commented 4 years ago

Hi ,thanks for the insights! It makes sense to me to freeze the encoder for actor update to ensure learning stability.

However, I don't understand why not freeze the final linear layer of the encoder as well? Do you have any comment on this? Thanks!