The question about gradient in VAT

Hi, I have some questions about the gradient in VAT. The functrion get_v_adv_loss(self, ul_batch, p_mult, power_iterations=1) in paper_network.ipynb has one statement:

gradient = tf.gradients(kl, [d], aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)[0]

Is this used for calculating $g=\nabla_{s+d} KL[p(·|s;\hat{\theta})||p(·|s+d;\hat{\theta})]$ defined in Eq.(7) in the original paper?

But in the original paper, the author cal the gradient of KL to $s+d$, while what you did in your code was the gradient of KL to [d], why? Are they the same? Thanks!

enricivi / adversarial_training_methods

The question about gradient in VAT #7