MishaLaskin / curl

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
MIT License
561 stars 88 forks source link

Sorry that l cannot reproduce some DMC results in the paper #15

Closed fiberleif closed 3 years ago

fiberleif commented 3 years ago

Dear CURL authors,

Thanks for such a big-impact work and released code !

Following the hyper-parameters from table 3 in the Implementation Details of appendix, l run each reported game for five seeds.
The results are: 500K steps score | Our results | CURL paper
Finger, Spin | 828 +/- 137 | 926 +/- 45 Cartpole, Swingup | 809 +/- 39 | 841 +/- 45 Reacher, Easy | 951 +/- 27 | 929 +/- 44 Cheetah, Run | 526 +/- 59 | 518 +/- 28 Walker, Walk | 892 +/- 49 | 902 +/- 43 Ball in Cup, Catch | 846 +/- 103 | 959 +/- 27

From the above results, we find in some games (e,g. Finger, Spin, Ball in Cup), the mean score is lower than your results, and the std is relatively high.

Besides, l find the 100K and 500K results of Pixel SAC are almost the same in Table 1 of the paper.

Have you met these questions when you run current codebase? Thank you so much!

MishaLaskin commented 3 years ago

The hyperparam table in the paper needs to be updated - I believe we used 512 batch size for the final runs. However, the results above are within the confidence intervals. Also worth noting that small differences across averaged seeds is an issue with RL as a field - see Figure 5 in https://arxiv.org/abs/1709.06560

fiberleif commented 3 years ago

Thanks for your quick reply. Do you mean that use batch size=512 for all six DMC environments?

Besides, l find the 100K and 500K results of Pixel SAC are almost the same in Table 1 of the paper. Is that expected?

Thank you so much !