gradient decent implementation

In your paper, the gradient is derived by this equation.

In your code you first calculated the loss, then use tf.gradient to derive the gradient:

self.g_loss = -tf.reduce_sum(
            tf.reduce_sum(
                tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
                    tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
                ), 1) * tf.reshape(self.rewards, [-1])
        )

g_opt = self.g_optimizer(self.learning_rate)

self.g_grad, _ = tf.clip_by_global_norm(tf.gradients(self.g_loss, self.g_params), self.grad_clip)
self.g_updates = g_opt.apply_gradients(zip(self.g_grad, self.g_params))

My understanding of your code is that self.g_loss is the sum of the log probability of a word at each timestamp given previous words, and each such log probability is multiplied by its respective reward.

Based on this loss and tf.gradients op, you calculated the gradient self.g_grad

However, in your paper, the gradient is calculated in a different way. It seems to me that according to your paper the gradient is equal to the sum of the gradient of the log probability times the reward. And your implementation seems to ignore this gradient, and use this equation as the loss, and apply gradient on this loss?

Could you please correct me if I am wrong? Thank you

LantaoYu / SeqGAN

gradient decent implementation #62