Closed fiberleif closed 3 years ago
The hyperparam table in the paper needs to be updated - I believe we used 512 batch size for the final runs. However, the results above are within the confidence intervals. Also worth noting that small differences across averaged seeds is an issue with RL as a field - see Figure 5 in https://arxiv.org/abs/1709.06560
Thanks for your quick reply. Do you mean that use batch size=512 for all six DMC environments?
Besides, l find the 100K and 500K results of Pixel SAC are almost the same in Table 1 of the paper. Is that expected?
Thank you so much !
Dear CURL authors,
Thanks for such a big-impact work and released code !
Following the hyper-parameters from table 3 in the Implementation Details of appendix, l run each reported game for five seeds.
The results are: 500K steps score | Our results | CURL paper
Finger, Spin | 828 +/- 137 | 926 +/- 45 Cartpole, Swingup | 809 +/- 39 | 841 +/- 45 Reacher, Easy | 951 +/- 27 | 929 +/- 44 Cheetah, Run | 526 +/- 59 | 518 +/- 28 Walker, Walk | 892 +/- 49 | 902 +/- 43 Ball in Cup, Catch | 846 +/- 103 | 959 +/- 27
From the above results, we find in some games (e,g. Finger, Spin, Ball in Cup), the mean score is lower than your results, and the std is relatively high.
Besides, l find the 100K and 500K results of Pixel SAC are almost the same in Table 1 of the paper.
Have you met these questions when you run current codebase? Thank you so much!