Open wert23239 opened 7 years ago
Your loss function for the simple policy doesn't really make sense
"Loss=-Log(pi)*A"
If you have a weight of .9 and reward of 1 your loss is .045.
but if you have a weight of .9 and your reward is 3 your loss increases to .09 .
So the only reason your function works at all is that you only assign a single amount of reward.
Your loss function for the simple policy doesn't really make sense
"Loss=-Log(pi)*A"
If you have a weight of .9 and reward of 1 your loss is .045.
but if you have a weight of .9 and your reward is 3 your loss increases to .09 .
So the only reason your function works at all is that you only assign a single amount of reward.