Simple Policy Faulty Loss Function

awjuliani / DeepRL-Agents

A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

MIT License

2.23k stars 825 forks source link

Open wert23239 opened 7 years ago

wert23239 commented 7 years ago

Your loss function for the simple policy doesn't really make sense

"Loss=-Log(pi)*A"

If you have a weight of .9 and reward of 1 your loss is .045.

but if you have a weight of .9 and your reward is 3 your loss increases to .09 .

So the only reason your function works at all is that you only assign a single amount of reward.