Environment reset twice per episode in enjoy script

caburu commented 4 years ago

I've created a local evaluation function with the modification I've proposed in this issue (https://github.com/hill-a/stable-baselines/issues/906) and I was expecting to have the same results using the enjoy script here.

But I realized that here, using one environment, env reset is also called twice per episode. The reason is that reset is called the first time automatically due to VecEnv and after, a seconde time, in the code below (line 154).

https://github.com/araffin/rl-baselines-zoo/blob/fd9d38862047d7fd4f67be8eb3f6736e093eac9f/enjoy.py#L148-L157

In line 149 it was supposed to be an and instead of or? Like below:

if done and infos[0].get('is_success', False):

caburu commented 4 years ago

Sorry, I realized there is no sense to be an and.

But, anyway, the reset call is not necessary while VecEnv does automatic reset.

araffin commented 4 years ago

Hello, thanks for pointing out this issue. i think there was two reason i did that:

with dqn i think i was not using a vecenv (to be confirmed), at least it was the case for training
this piece of code is a bit a hack to reset env early when the goal is reached. Because with the robotic env, the env is reset only after the max episode step and i did not want to wait. You can comment the reset if needed ;)

caburu commented 4 years ago

Yes, I've commented the line in my fork ;)

araffin commented 2 years ago

Should be fixed in SB3 and its zoo: https://github.com/DLR-RM/rl-baselines3-zoo

araffin / rl-baselines-zoo

Environment reset twice per episode in enjoy script #90