Open lichuminglcm opened 3 years ago
Hmmm, it's a bit strange if you did not change anything in the code. But DRL tends to have large variance. Can you try running with 5 different random seeds and see what the median performance is?
好哥们,可以恰个v不?我也在搞model-based drl,可以交流一下。
hi, I ran the hopper experiment with the provided command, and now the reward during the 65k-68k envstep is between 400 and 700, which is much lower than the provided figure. Is there anything that I missed potentially?