Closed caburu closed 2 years ago
Sorry, I realized there is no sense to be an and
.
But, anyway, the reset call is not necessary while VecEnv does automatic reset.
Hello, thanks for pointing out this issue. i think there was two reason i did that:
with dqn i think i was not using a vecenv (to be confirmed), at least it was the case for training
this piece of code is a bit a hack to reset env early when the goal is reached. Because with the robotic env, the env is reset only after the max episode step and i did not want to wait. You can comment the reset if needed ;)
Yes, I've commented the line in my fork ;)
Should be fixed in SB3 and its zoo: https://github.com/DLR-RM/rl-baselines3-zoo
I've created a local evaluation function with the modification I've proposed in this issue (https://github.com/hill-a/stable-baselines/issues/906) and I was expecting to have the same results using the enjoy script here.
But I realized that here, using one environment, env reset is also called twice per episode. The reason is that reset is called the first time automatically due to VecEnv and after, a seconde time, in the code below (line 154).
https://github.com/araffin/rl-baselines-zoo/blob/fd9d38862047d7fd4f67be8eb3f6736e093eac9f/enjoy.py#L148-L157
In line 149 it was supposed to be an
and
instead ofor
? Like below: