Xingyu-Lin / mbpo_pytorch

A pytorch reprelication of the model-based reinforcement learning algorithm MBPO
150 stars 38 forks source link

cannot reproduce #4

Open lichuminglcm opened 3 years ago

lichuminglcm commented 3 years ago

hi, I ran the hopper experiment with the provided command, and now the reward during the 65k-68k envstep is between 400 and 700, which is much lower than the provided figure. image Is there anything that I missed potentially?

Xingyu-Lin commented 3 years ago

Hmmm, it's a bit strange if you did not change anything in the code. But DRL tends to have large variance. Can you try running with 5 different random seeds and see what the median performance is?

lichuminglcm commented 3 years ago

好哥们,可以恰个v不?我也在搞model-based drl,可以交流一下。