How should I understand the RL loss function

loss function for the policy gradient is: self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ), 1) * tf.reshape(self.rewards, [-1])

And I can hardly understand how the g_loss term is the same as the objective function in the paper, and also the relationship with the policy gradient approximation in the paper.

LantaoYu / SeqGAN

How should I understand the RL loss function #64