MishaLaskin / rad

RAD: Reinforcement Learning with Augmented Data
400 stars 71 forks source link

Main difference from DrQ? #1

Closed miriaford closed 4 years ago

miriaford commented 4 years ago

Thanks for sharing the code!

I wonder what's the main algorithmic difference between DrQ-SAC and RAD-SAC? You only mentioned DrQ in passing in the paper, but didn't elaborate. Thanks!

MishaLaskin commented 4 years ago

RAD and DrQ are concurrent (published 2 days apart). Main difference:

In addition to data aug, DrQ modifies underlying SAC algo by weighing Q functions (both Q and target Q). RAD does not modify the underlying algo at all, it achieves same results only with data aug and can plug and play with any RL algo (we also show that it works with PPO with SOTA test-time generalization on ProcGen).

RAD also extensively ablates a variety of data augs and provides insight as to why random crop works well.

denisyarats commented 4 years ago

One of the authors of DrQ here.

1) in DrQ, we demonstrate that both data augmentation and Q-function regularization are helpful. In Figure 1 we show that just with data augmentation you can achieve SOTA: image

2) We then show (in Figure 2), that our additional Q-function regularization provides additional boost: image

3) Finally, our results are still better across the board (and we run 10 seeds not 3 :)). Here is the table:

image

Data augmentation alone is not enough for SOTA performance on some harder tasks.

TaoHuang13 commented 2 years ago

enough

@denisyarats When comparing DrQ with RAD, do you convert the 'step' in RAD eval.log to 'environment step'?