LuEE-C / PPO-Keras

My implementation of the Proximal Policy Optisation algorithm using Keras as a backend
88 stars 24 forks source link

loss function error #4

Closed SonuDixit closed 5 years ago

SonuDixit commented 5 years ago

Hi, Thanks a lot for sharing your code.

I think there is an error in loss calculation.

entropy = - sum( prob * log(prob) ) We need to maximize entropy.

In the loss function, line 48, it should be *- prob log(prob), inside -mean( )**.

LuEE-C commented 5 years ago

The loss is minimized, so we have -mean(... + Entropy) that we minimize. We essentially minimize -Entropy, which is equivalent to maximising the entropy

SonuDixit commented 5 years ago

Entropy = - prob * log (prob)

The term used inside loss function, prob *log (prob), is negative of entropy. We should have entropy.

LuEE-C commented 5 years ago

Ah yes your right, thanks for pointing it out it's fixed now