Closed miriaford closed 4 years ago
We based our implementation on SAC-AE (Yarats et al., 2019), where it's been shown to be beneficial for stable training, we drew the inspiration from the DeepMind Control Suite white paper (Tassa et al., 2018).
Note, that recent papers like CURL and RAD are built on our code from SAC-AE, so they just use the exact same trick.
We believe that this trick is crucial for SOTA performance and we have strong empirical evidence for this.
Here is our understanding on why this is important:
This is by far the best response I get on Github from any paper author. Thank you so much for the explanation!
Hi ,thanks for the insights! It makes sense to me to freeze the encoder for actor update to ensure learning stability.
However, I don't understand why not freeze the final linear layer of the encoder as well? Do you have any comment on this? Thanks!
I might have missed something simple, but could you please kindly explain why don't you update the encoder part?
https://github.com/denisyarats/drq/blob/master/drq.py#L263-L264
In other SAC implementations (e.g. rlkit), the gradient back-props through the entire policy network. Thanks!