Closed xkianteb closed 3 years ago
I can not seem to figure out if you are normalizing the reward and state space for continuous control problems.
pytorch-a2c-ppo-acktr does it -- (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/1951751a03b78307bb60ba542019756ebcb5200c/a2c_ppo_acktr/envs.py#L99)
stablebasleine does it -- (https://github.com/araffin/rl-baselines-zoo/blob/ff84f398a1fae65e18819490bb4e41a201322759/hyperparams/a2c.yml#L54)
and Deep Reinforcement Learning that Matters (https://arxiv.org/pdf/1709.06560.pdf) says it helps.
I can not seem to figure out if you are normalizing the reward and state space for continuous control problems.
pytorch-a2c-ppo-acktr does it -- (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/1951751a03b78307bb60ba542019756ebcb5200c/a2c_ppo_acktr/envs.py#L99)
stablebasleine does it -- (https://github.com/araffin/rl-baselines-zoo/blob/ff84f398a1fae65e18819490bb4e41a201322759/hyperparams/a2c.yml#L54)
and Deep Reinforcement Learning that Matters (https://arxiv.org/pdf/1709.06560.pdf) says it helps.