A little confusion from a newbie.How to get the loss function of REINFORCE?

chingyaoc / pytorch-REINFORCE

PyTorch Implementation of REINFORCE for both discrete & continuous control

262 stars 51 forks source link

A little confusion from a newbie.How to get the loss function of REINFORCE? #1

Closed natpagle204 closed 7 years ago

natpagle204 commented 7 years ago

Hi, thanks for your code, they are well written and very helpful to newbie like me!!

But there is one line that I don't understand, I'd appreciate if you could explain that to me.

loss = loss - (log_probs[i](Variable(R).expand_as(log_probs[i])).cuda()).sum() - (0.0001entropies[i].cuda()).sum() loss = loss / len(rewards)

I wonder how can I derive the loss function of REINFOCE algorithm. And why does the entropy term appear? This doesn't seem to be straight forward to me.

chingyaoc commented 7 years ago

Hi, This is the update rule for policy gradient method. I recommend you to check Sutton (a.k.a godfather of RL)'s paper. Entropy term is a common trick that encourages policy to be less deterministic which helps exploration. Hope these informations help you :)