LantaoYu / SeqGAN

Implementation of Sequence Generative Adversarial Nets with Policy Gradient
2.08k stars 711 forks source link

gradient decent implementation #62

Open o20021106 opened 5 years ago

o20021106 commented 5 years ago

In your paper, the gradient is derived by this equation.

image

In your code you first calculated the loss, then use tf.gradient to derive the gradient:

self.g_loss = -tf.reduce_sum(
            tf.reduce_sum(
                tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
                    tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
                ), 1) * tf.reshape(self.rewards, [-1])
        )

g_opt = self.g_optimizer(self.learning_rate)

self.g_grad, _ = tf.clip_by_global_norm(tf.gradients(self.g_loss, self.g_params), self.grad_clip)
self.g_updates = g_opt.apply_gradients(zip(self.g_grad, self.g_params))

My understanding of your code is that self.g_loss is the sum of the log probability of a word at each timestamp given previous words, and each such log probability is multiplied by its respective reward.

Based on this loss and tf.gradients op, you calculated the gradient self.g_grad

However, in your paper, the gradient is calculated in a different way. It seems to me that according to your paper the gradient is equal to the sum of the gradient of the log probability times the reward. And your implementation seems to ignore this gradient, and use this equation as the loss, and apply gradient on this loss?

Could you please correct me if I am wrong? Thank you