Advantage computation fix - Chapter 8 models.py

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt

MIT License

2.83k stars 1.29k forks source link

Closed Sycor4x closed 5 years ago

Sycor4x commented 5 years ago

Fixing bug that computes advantage values as an average across the minibatch, instead of one advantage value per sample. See: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/6

Shmuma commented 5 years ago

Thanks!