ShangtongZhang / DeepRL

Modularized Implementation of Deep RL Algorithms in PyTorch
MIT License
3.21k stars 684 forks source link

How to get averaged curve of PPO online performance on Mujoco? #91

Closed KarlXing closed 4 years ago

KarlXing commented 4 years ago

Hi, thank you for this awesome work. I have a question about how to get the curve of PPO online performance on mujoco environments as shown in readme. I wonder what do you mean smoothed by a window of size 10? The step that receives episodic return could vary across different runs and thus it's not clear for me how to do the average over them. I didn't find an answer from other resources and really appreciate it if you could give me a hint. Thanks!

KarlXing commented 4 years ago

Found the answer elsewhere. According to https://spinningup.openai.com/en/latest/spinningup/bench.html, performance for the on-policy algorithms is measured as the average trajectory return across the batch collected at each epoch.

ShangtongZhang commented 4 years ago

I use np.interp to align steps across different runs.