Closed miriaford closed 4 years ago
RAD and DrQ are concurrent (published 2 days apart). Main difference:
In addition to data aug, DrQ modifies underlying SAC algo by weighing Q functions (both Q and target Q). RAD does not modify the underlying algo at all, it achieves same results only with data aug and can plug and play with any RL algo (we also show that it works with PPO with SOTA test-time generalization on ProcGen).
RAD also extensively ablates a variety of data augs and provides insight as to why random crop works well.
One of the authors of DrQ here.
1) in DrQ, we demonstrate that both data augmentation and Q-function regularization are helpful. In Figure 1 we show that just with data augmentation you can achieve SOTA:
2) We then show (in Figure 2), that our additional Q-function regularization provides additional boost:
3) Finally, our results are still better across the board (and we run 10 seeds not 3 :)). Here is the table:
Data augmentation alone is not enough for SOTA performance on some harder tasks.
enough
@denisyarats When comparing DrQ with RAD, do you convert the 'step' in RAD eval.log to 'environment step'?
Thanks for sharing the code!
I wonder what's the main algorithmic difference between DrQ-SAC and RAD-SAC? You only mentioned DrQ in passing in the paper, but didn't elaborate. Thanks!