Open liyan2015 opened 4 years ago
Good point.
Overall, it should not make a big difference as the main point is to normalize the reward magnitude. But for consistency, I agree that gamma should be updated.
Related: https://github.com/hill-a/stable-baselines/issues/698
Hi, from this issue, it says
VecNormalize
'sgamma
should match thegamma
of RL algorithm (e.g.,gamma
=0.99 should be consistent in bothPPO2
andVecNormalize
) to ensure consistent sliding window size. However, it seems the normalization arguments used increate_env
are always the default one read from.yml
file (i.e.,gamma
=0.99 as default): https://github.com/araffin/rl-baselines-zoo/blob/fd9d38862047d7fd4f67be8eb3f6736e093eac9f/train.py#L269although
gamma
has different candidates inhyperparams_opt.py
: https://github.com/araffin/rl-baselines-zoo/blob/fd9d38862047d7fd4f67be8eb3f6736e093eac9f/utils/hyperparams_opt.py#L188The same applies for rl-baselines3-zoo. Is this a bug? Should
create_env
considergamma
change in initiatingVecNormalize
per trial? Please give me some hint if I missed anything, thank you!