araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.13k stars 208 forks source link

[question] In train.py, why is gamma in VecNormalize not updated per trial? #91

Open liyan2015 opened 4 years ago

liyan2015 commented 4 years ago

Hi, from this issue, it says VecNormalize's gamma should match the gamma of RL algorithm (e.g., gamma=0.99 should be consistent in both PPO2 and VecNormalize) to ensure consistent sliding window size. However, it seems the normalization arguments used in create_env are always the default one read from .yml file (i.e., gamma=0.99 as default): https://github.com/araffin/rl-baselines-zoo/blob/fd9d38862047d7fd4f67be8eb3f6736e093eac9f/train.py#L269

although gamma has different candidates in hyperparams_opt.py: https://github.com/araffin/rl-baselines-zoo/blob/fd9d38862047d7fd4f67be8eb3f6736e093eac9f/utils/hyperparams_opt.py#L188

The same applies for rl-baselines3-zoo. Is this a bug? Should create_env consider gamma change in initiating VecNormalize per trial? Please give me some hint if I missed anything, thank you!

araffin commented 4 years ago

Good point.

Overall, it should not make a big difference as the main point is to normalize the reward magnitude. But for consistency, I agree that gamma should be updated.

Related: https://github.com/hill-a/stable-baselines/issues/698