Closed JakobThumm closed 1 year ago
Hello, ~how do you know it is doing more than 3 evaluations episodes?~
env_wrapper:
- gym.wrappers.TimeLimit: max_episode_steps: 1000
Why are you adding a timelimit? If you do so, you need to add a monitor file afterward so it is taken into account. Otherwise the evaluation will only use the original termination (see https://github.com/DLR-RM/stable-baselines3/issues/181 for why we are doing that).
EDIT: to check the number of evaluations:
import numpy as np
evaluations = np.load("logs/sac/BipedalWalkerHardcore-v3_12/evaluations.npz")
print(evaluations["ep_lengths"].shape)
Why are you adding a timelimit?
In my custom environment, I would like to have limited episode length. Isn't the TimeLimit Wrapper the way to go then?
If you do so, you need to add a monitor file afterward so it is taken into account.
I added the basic common.monitor.Monitor
, which fixed the issue.
I still don't fully understand why we need the monitor after reading the linked issue. However, if simply adding a monitor fixes the issue, I'm happy :) Thank you Antonin
I still don't fully understand why we need the monitor after reading the linked issue.
Best is to take a look at the code: https://github.com/DLR-RM/stable-baselines3/blob/52c29dc497fa2eb235d0476b067bed8ac488fe64/stable_baselines3/common/evaluation.py#L103-L114
This clarifies the matter, thanks :+1:
Describe the bug The evaluation runs more than n_eval_episodes. (>100 eval episodes or even infinite)
Code example For my custom env, the evaluation runs for >100 episodes, even though I set the number of eval episodes to 3.
I was able to reproduce the error for a common environment:
sac.yml
Note that this issue occurs if and only if I change the
net_arch
from[400, 300]
to[256, 256]
. This issue also does not occur on seed 0, but it does happen on seed 42.Apparently, the evaluation is doing more than I expect. I would assume, the evaluation just runs for the given number of episodes and then continues training.
System Info Describe the characteristic of your environment:
Additional Info I created a simple wrapper that prints a statement whenever a new episode begins to debug this issue.