christiancosgrove / pytorch-spectral-normalization-gan

Paper by Miyato et al. https://openreview.net/forum?id=B1QRgziT-
MIT License
677 stars 109 forks source link

slef.u can't be updated? #1

Open WellYoungIOE opened 6 years ago

WellYoungIOE commented 6 years ago

The self.u in SpectralNorm seems never update during training. And u in spectral_norm will be reinitialized with u = Variable(W.data.new((W.data.shape[1]))).data.normal_(0,1) in every epoch. The type of self.u is None all the time. Any way to figure this out? Thanks!

christiancosgrove commented 6 years ago

Thanks for pointing this out. spectral_norm wasn't properly initializing self.u so I've moved the initialization into SpectralNorm's constructor.

WellYoungIOE commented 6 years ago

The u_n in v_n = F.normalize(torch.mv(torch.t(W.view(height,-1).data), u_n), p=2, dim=0), u_n = F.normalize(torch.mv(W.view(height,-1).data, v_n), p=2, dim=0) and singular_value = torch.dot(u_n, prod) I think you mean u_n.data. Now self.u is updating in every forward. But it also makes the code unable to run on multi GPUs.

christiancosgrove commented 6 years ago

Wouldn't it be sufficient to make self.u a Tensor and not a Variable, since (as described OpenReview responses) its gradients don't have to be computed? If this is the case, I don't think u_n.data would be right. What makes this implementation unable to run on multiple GPUs?

WellYoungIOE commented 6 years ago

torch.mv only support the operation between two Tensor. It's OK to make self.u a Varibale or even a Parameter, but the u_n here should be a Tensor, or it will turn out with a TypeError: TypeError: torch.mv received an invalid combination of arguments - got (torch.cuda.FloatTensor, !Variable!), but expected (torch.cuda.FloatTensor source, torch.cuda.FloatTensor vec). With DataParallel, the weights of discriminator will be sent to every GPU. But self.u is always in gpu0 during training. Then the computation of v_n will lead to an error: RuntimeError: arguments are located on different GPUs.

christiancosgrove commented 6 years ago

I see. Thanks for explaining this issue.

Perhaps it would be better to make everything a Variable (in fact, make self.u a Parameter and register it in the list of parameters so that DataParallel puts it on the right device). Then, instead of using the nondifferentiable torch.mv operation, use autograd-friendly matrix-vector multiplication.

The benefit of using autograd Variables is that it should compute the gradient regularization term (equation 12 of https://openreview.net/pdf?id=B1QRgziT-), which I was overlooking before.

I'll try to get this approach working.

christiancosgrove commented 6 years ago

I've made these changes. Let me know if there are further problems. Thanks

WellYoungIOE commented 6 years ago

It works fine with one GPU. But u and v can not be updated with multiple GPUs.

There are still problems with the discriminator when I use multiple GPUs. The u and v in _update_u_v seems to be the copy of the original one and they are turned into Variable. In every forward, u and v will be reinitialized to the original value which are calculated in _make_params.