christiancosgrove / pytorch-spectral-normalization-gan

Paper by Miyato et al. https://openreview.net/forum?id=B1QRgziT-
MIT License
679 stars 109 forks source link

Large D/G losses? #6

Open christopher-beckham opened 6 years ago

christopher-beckham commented 6 years ago

Hi,

I'm using recently-released PyTorch 0.4 (not sure if that's causing the funky numbers I'm getting), but I'm getting the following (with python main.py --model resnet --loss wasserstein):

disc loss tensor(-0.2347, device='cuda:0') gen loss tensor(-0.6066, device='cuda:0')
disc loss tensor(-120.7743, device='cuda:0') gen loss tensor(-1614.5465, device='cuda:0')
disc loss tensor(-121.0873, device='cuda:0') gen loss tensor(-1225.4401, device='cuda:0')
disc loss tensor(-56.4558, device='cuda:0') gen loss tensor(-2320.5115, device='cuda:0')
disc loss tensor(-45.6140, device='cuda:0') gen loss tensor(-2665.3479, device='cuda:0')
disc loss tensor(-46.4297, device='cuda:0') gen loss tensor(-3849.7197, device='cuda:0')
disc loss tensor(-39.8169, device='cuda:0') gen loss tensor(-4879.6089, device='cuda:0')
disc loss tensor(-56.9688, device='cuda:0') gen loss tensor(-5421.9688, device='cuda:0')
disc loss tensor(-3.2100, device='cuda:0') gen loss tensor(-4737.8677, device='cuda:0')
disc loss tensor(-36.7729, device='cuda:0') gen loss tensor(-4344.2520, device='cuda:0')
disc loss tensor(-55.6719, device='cuda:0') gen loss tensor(-6263.5303, device='cuda:0')
disc loss tensor(-62.0518, device='cuda:0') gen loss tensor(-7915.4751, device='cuda:0')
disc loss tensor(-0.5933, device='cuda:0') gen loss tensor(-7315.9282, device='cuda:0')
disc loss tensor(-26.8652, device='cuda:0') gen loss tensor(-10451.8770, device='cuda:0')
disc loss tensor(-48.6777, device='cuda:0') gen loss tensor(-8293.3584, device='cuda:0')

Is this meant to happen?

Thanks!

f90 commented 6 years ago

Can confirm that with a completely different, self-made Tensorflow implementation that the estimated Wasserstein distances get very very large. Also don't really know what is causing it... Normally values are in the range of 0 to 10 or 20, when using WGAN-GP

christopher-beckham commented 6 years ago

Doing it with GP would be counterintuitive though, since the spec norm is meant to be a (computationally cheaper) replacement for it. But thanks for reporting that on your side.

f90 commented 6 years ago

Yes I did not use GP + spectral norm at the same time, rather I used a lot of WGAN-GP, and my experience there was that the estimated Wasserstein distance was usually between 0 to 10 or 20. Then I removed the GP, and replaced it with the spectral normalization, but kept everything else the same (including Wasserstein loss), and now the estimated Wasserstein distances are all over the place, in the millions etc.

HiddenMachine3 commented 6 months ago

Yes I did not use GP + spectral norm at the same time, rather I used a lot of WGAN-GP, and my experience there was that the estimated Wasserstein distance was usually between 0 to 10 or 20. Then I removed the GP, and replaced it with the spectral normalization, but kept everything else the same (including Wasserstein loss), and now the estimated Wasserstein distances are all over the place, in the millions etc.

Are you sure you're dividing the weights of the convolution layer by the spectral norm correctly? If you implemented it correctly, it shouldn't reach such a high number.

I had a similar issue at the start where i was multiplying by the spectral norm instead of dividing by it, so that could be the issue for it reaching millions.