christiancosgrove / pytorch-spectral-normalization-gan

Paper by Miyato et al. https://openreview.net/forum?id=B1QRgziT-
MIT License
676 stars 110 forks source link

How does _u and _v update? #13

Open luhaofang opened 5 years ago

luhaofang commented 5 years ago

Thanks for your clear implementation. I encounter the problem about the _u and _v update policy. I've noticed that in your implementation, _u is updated before op's inference phase, does _u need back propagate update by the gradient? Another problem is, should I update the gradient created by w_bar to original weight directly? I found you mentioned the point, but I think update to w_bar seems more reasonable, and during the next iteration taking w_bar as original weight, am I right?

christiancosgrove commented 5 years ago

Q1: In the nondifferentiable spectral normalization layer spectral_normalization_nondiff.py we do not backpropagate the gradients. This is because during inference, we overwrite the current weights.

In the differentiable layer spectral_normalization.py, we modify the computation graph by creating new parameters *_u, *_v, and *_bar for every weight w. The parameter w is replaced with w_bar, which allows gradients to flow to *_u (and to the original weight w) during backpropagation.

It's worth nothing that in my experiments, I didn't find a major difference between these two implementations.

Q2:

In the differentiable implementation, the gradient with respect to w_bar will be used to compute the gradient with respect to w, _u and _v. In the nondifferentiable implementation, the gradient updates w directly, which is then normalized during the next inference step.