LantaoYu / SeqGAN

Implementation of Sequence Generative Adversarial Nets with Policy Gradient
2.08k stars 711 forks source link

How should I understand the RL loss function #64

Closed yanghoonkim closed 4 years ago

yanghoonkim commented 5 years ago

loss function for the policy gradient is: self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ), 1) * tf.reshape(self.rewards, [-1])

And I can hardly understand how the g_loss term is the same as the objective function in the paper, and also the relationship with the policy gradient approximation in the paper.