Closed Sycor4x closed 5 years ago
Fixing bug that computes advantage values as an average across the minibatch, instead of one advantage value per sample. See: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/6
Thanks!
Fixing bug that computes advantage values as an average across the minibatch, instead of one advantage value per sample. See: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/6