Open WellYoungIOE opened 6 years ago
Thanks for pointing this out. spectral_norm
wasn't properly initializing self.u
so I've moved the initialization into SpectralNorm
's constructor.
The u_n
in v_n = F.normalize(torch.mv(torch.t(W.view(height,-1).data), u_n), p=2, dim=0)
, u_n = F.normalize(torch.mv(W.view(height,-1).data, v_n), p=2, dim=0)
and singular_value = torch.dot(u_n, prod)
I think you mean u_n.data
.
Now self.u
is updating in every forward. But it also makes the code unable to run on multi GPUs.
Wouldn't it be sufficient to make self.u
a Tensor
and not a Variable
, since (as described OpenReview responses) its gradients don't have to be computed? If this is the case, I don't think u_n.data
would be right.
What makes this implementation unable to run on multiple GPUs?
torch.mv
only support the operation between two Tensor
. It's OK to make self.u
a Varibale
or even a Parameter
, but the u_n
here should be a Tensor
, or it will turn out with a TypeError
: TypeError: torch.mv received an invalid combination of arguments - got (torch.cuda.FloatTensor, !Variable!), but expected (torch.cuda.FloatTensor source, torch.cuda.FloatTensor vec)
.
With DataParallel
, the weights of discriminator
will be sent to every GPU. But self.u
is always in gpu0
during training. Then the computation of v_n
will lead to an error: RuntimeError: arguments are located on different GPUs
.
I see. Thanks for explaining this issue.
Perhaps it would be better to make everything a Variable
(in fact, make self.u
a Parameter
and register it in the list of parameters so that DataParallel
puts it on the right device). Then, instead of using the nondifferentiable torch.mv
operation, use autograd-friendly matrix-vector multiplication.
The benefit of using autograd Variable
s is that it should compute the gradient regularization term (equation 12 of https://openreview.net/pdf?id=B1QRgziT-), which I was overlooking before.
I'll try to get this approach working.
I've made these changes. Let me know if there are further problems. Thanks
It works fine with one GPU. But u
and v
can not be updated with multiple GPUs.
There are still problems with the discriminator when I use multiple GPUs. The u
and v
in _update_u_v
seems to be the copy of the original one and they are turned into Variable
. In every forward, u
and v
will be reinitialized to the original value which are calculated in _make_params
.
The
self.u
inSpectralNorm
seems never update during training. Andu
inspectral_norm
will be reinitialized withu = Variable(W.data.new((W.data.shape[1]))).data.normal_(0,1)
in every epoch. The type ofself.u
is None all the time. Any way to figure this out? Thanks!