DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
2.11k stars 516 forks source link

[Bug]: Missing default value for noise_type (for ddpg/td3) leads to unexpected behvaiours #348

Open qihuazhong opened 1 year ago

qihuazhong commented 1 year ago

🐛 Bug

Using TD3 as an exmaple, if the the noise_type is not specified for a custom environment in td3.yml. The following weird behavior happens:

The logic of deciding n_actions would be skipped and n_actions would remain None (in exp_manager.py). The value None will be further passed down to the Noise constructor, e.g: NormalActionNoise(mean=np.zeros(trial.n_actions), sigma=noise_std * np.ones(trial.n_actions))

Depending on the value of n_envs, the program would raise an error, or produce unintended result silently.

The unintended behvaiour also depends on the actual environment action space, but you get the idea..

==========================

I think people expect that when a default param is not specified in td3.yml but present in the params sampler (e.g. sample_td3_params() in hyperparams_opt.py), the program will just use a sampled value and work as intended.

To Reproduce

python train.py --algo td3 --env "CustomEnv-v0" -optimize --n-trials 100 --sampler tpe --pruner median

Relevant log output / Error message

No response

System Info

No response

Checklist

araffin commented 1 year ago

Hello,

thanks for reporting the issue, it is indeed a bug. We should also use a VecActionNoise when using n_envs > 1.

I would be happy to receive a PR that fix this issue =)