alexis-jacq / Pytorch-DPPO

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286
MIT License
180 stars 40 forks source link

clamp ratio #3

Open cswhjiang opened 7 years ago

cswhjiang commented 7 years ago

It seems that you should clamp ratio, not surr1.

https://github.com/alexis-jacq/Pytorch-DPPO/blob/master/ppo.py#L145

alexis-jacq commented 7 years ago

Thanks a lot! I did not see this typo!