Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.09k stars 186 forks source link

Few Runtime errors #10

Closed sandeepnRES closed 5 years ago

sandeepnRES commented 5 years ago

I received Runtime errors(invalid value in reduce). I think it's better to use BCEwithlogits as loss criterion for discriminator, its numerically stable.

sandeepnRES commented 5 years ago