denisyarats / drq

DrQ: Data regularized Q
https://sites.google.com/view/data-regularized-q
MIT License
407 stars 52 forks source link

Replicating table 1 of the paper #14

Closed snailrowen1337 closed 4 years ago

snailrowen1337 commented 4 years ago

Dear Denis,

Thanks for open-sourcing this, the paper is really cool! I am trying to replicate table 1 with the planet benchmark and ran into some problems for the SAC-state baseline. I am using your implementation of SAC-state (github.com/denisyarats/pytorch_sac) but fail to reach the reported performance. Was action repeat applied to SAC-state in table 1? For each environment, I am using frame_skip = action_repeat, where action_repeat comes from table 2 in the paper. To only use 500,000 environment steps, I set num_train_steps = 500,000 // action_repeat. Am I missing something here? Once I figure this out, I will replicate the DrQ experiments. Thanks!!

denisyarats commented 4 years ago

Hi, for SAC States we don't use action repeat at all as per common practice. To adjust to this difference between DrQ (and other methods that use action repeat) and SAC states, we compare performance in true environment steps (see appendix B.3): image

Action repeat can be thought as a hyperparameter for a learning algorithm, and for SAC States it happens to be equal to 1.

snailrowen1337 commented 4 years ago

Ah, that makes perfect sense. This must be the reason for the discrepancy. I will try removing the action repeat, thanks!!