Open ZJEast opened 7 months ago
This may be due to a learning rate too high, see https://github.com/DLR-RM/rl-baselines3-zoo/issues/156#issuecomment-910097343; do you use the default hyperparams?
Also related (and probably duplicate): https://github.com/DLR-RM/stable-baselines3/issues/1401 and https://github.com/DLR-RM/stable-baselines3/issues/1418
yes, I use the default hyperparams, I try different learning rate later.
Hello, thanks for sharing the bug report. Does the NaN happen only for some runs or for all runs? Could you log and share a failed run using W&B? (that would allow us to take a look at all the logged data)
I also assume you are using pybullet gymnasium repo?
I'll try to reproduce the issue in the meantime.
Also related: https://github.com/DLR-RM/stable-baselines3/issues/1372 changing to AdamW might solve the problem too.
I have tried TD3, SAC, TQC on some pybullet envs. And it only happens for the task I mention, the others is fine. I install pybullet env by 'pip install -r ./requirements.txt' .
I can upload some log file.
sac-AntBulletEnv-v0.zip sac-HalfCheetahBulletEnv-v0.zip tqc-AntBulletEnv-v0.zip tqc-HalfCheetahBulletEnv-v0.zip
Thanks =)
Looking at the log it seems to be due to an explosion of std (and you are using a much larger budget that the one we were using by default).
So, setting use_expln=True
(and maybe using AdamW) should solve your issue.
I would appreciate a PR that adds this parameter =)
Hmm, for TD3 it is weird if it happens as it doesn't rely on any distribution.
EDIT: I guess the issue is similar to https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/146 by @qgallouedec
Bug already encountered in openrlbenchmark, ~I might have forgotten to report it~: https://wandb.ai/openrlbenchmark/sb3/runs/27cez5ua EDIT: I did report it, you're right @araffin ;)
For TD3, I only found two runs where you have an explosion of the losses, but this didn't lead to the bug: https://wandb.ai/openrlbenchmark/sb3/runs/2qdjqemd (Walker2DBulletEnv-v0) https://wandb.ai/openrlbenchmark/sb3/runs/ffc7kx3m (BipedalWalkerHardcore-v0) What a wonderful tool openrlbenchmark is, ping @vwxyzjn ;)
after I change the hyperparams from
policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300])"
to
policy_kwargs: "dict(log_std_init=-3, net_arch=[400, 300], use_expln=True)"
this problem never happens again, so let's close this issue
Thanks for trying out =) i'm reopening as we need to change the defaults (we would welcome a PR).
š Bug
Hello. I am trying to reproduce some algorithms or experiments, to record some data. But some expectation happens, nan is generated for some unknown reasons. Any advice to solve?
To Reproduce
Relevant log output / Error message
System Info
Checklist