CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.7k stars 1.13k forks source link

Discriminator Loss Bug #137

Open kaihe opened 2 years ago

kaihe commented 2 years ago

As refered in #93 , discriminator is the key to get sharp image. But in my experiments, sometime aeloss get negative values. image I think in any network design, negative loss should be avoid because it could go negative infinity.

The negative term comes from g_loss as g_loss = -torch.mean(logits_fake) , where logits_fake is the value after convolution, no sigmoid to limit the outputs logit_values. When the generator successfully generate a correct patch such as pure white space, discriminator will find some way to exploit this patch by making its logit go infinity, which will encourage generator to draw more patches like this and completely ignore other loss terms.

After checking the original code in cycle-gan repo,
image There do exist an outer loss term to limit the patch logits, only if gan_mode in ['wgangp']. I did not see any fancy 'wgangp' thing, so this should be a bug.

thuangb commented 2 years ago

This is not a bug, the loss used in the paper is Hinge GAN loss, which can have negative values for g_loss. If trained stably and correctly, the loss won't go to negative infinity

SuwoongHeo commented 1 year ago

Hi, Im curious about how theg_loss` is derived without using a sigmoid or softplus function. After searching several hours, I cannot find any reference for using logits directly like in this implementation. https://github.com/CompVis/taming-transformers/blob/1bbc027acb6a47e4eb348d611f9af53f1038ffee/taming/modules/losses/vqperceptual.py#L98

function2-llx commented 1 year ago

Hi, Im curious about how theg_loss` is derived without using a sigmoid or softplus function. After searching several hours, I cannot find any reference for using logits directly like in this implementation.

https://github.com/CompVis/taming-transformers/blob/1bbc027acb6a47e4eb348d611f9af53f1038ffee/taming/modules/losses/vqperceptual.py#L98

@SuwoongHeo That seems to be Wasserstein loss. I find an implementation here from the official repository of paper Improved Training of Wasserstein GANs.

The description in the VQGAN paper seems inaccurate, which claims to use a binary cross entropy loss following Patch GAN.

image

whchan05 commented 1 year ago

Hi, Im curious about how theg_loss` is derived without using a sigmoid or softplus function. After searching several hours, I cannot find any reference for using logits directly like in this implementation. https://github.com/CompVis/taming-transformers/blob/1bbc027acb6a47e4eb348d611f9af53f1038ffee/taming/modules/losses/vqperceptual.py#L98

@SuwoongHeo That seems to be Wasserstein loss. I find an implementation here from the official repository of paper Improved Training of Wasserstein GANs.

The description in the VQGAN paper seems inaccurate, which claims to use a binary cross entropy loss following Patch GAN.

image

I wonder why they won't simply stick with what's said in the paper? I spent few hours trying to figure out why I am lucky to stumble upon your comment but imagine if I started a few weeks earlier it would have been a nightmare