Closed KarlXing closed 4 years ago
Found the answer elsewhere. According to https://spinningup.openai.com/en/latest/spinningup/bench.html, performance for the on-policy algorithms is measured as the average trajectory return across the batch collected at each epoch.
I use np.interp
to align steps across different runs.
Hi, thank you for this awesome work. I have a question about how to get the curve of PPO online performance on mujoco environments as shown in readme. I wonder what do you mean smoothed by a window of size 10? The step that receives episodic return could vary across different runs and thus it's not clear for me how to do the average over them. I didn't find an answer from other resources and really appreciate it if you could give me a hint. Thanks!