alexis-jacq / Pytorch-DPPO

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286
MIT License
180 stars 40 forks source link

on advantages #5

Open cn3c3p opened 6 years ago

cn3c3p commented 6 years ago

after test your PPO, and compare with another , i think your advantages need to been : (advantages - advantages.mean()) / advantages.std() for you reference

alexis-jacq commented 6 years ago

Thanks for the notification, I will try with this normalization. Can-I ask you with which one did you compare?