FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
1.33k stars 55 forks source link

Discriminator is not training properly? #27

Closed ThisisBillhe closed 5 months ago

ThisisBillhe commented 5 months ago

Hi peize, I try to train VQGAN with your default config, where the discriminator start training after 20k iterations. However, I notice that logits_real and logits_fake are very close all the time. For example: Beginning epoch 7... (Generator) rec_loss: 0.0441, perceptual_loss: 0.2432, vq_loss: 0.0107, commit_loss: 0.0027, entropy_loss: -0.0000, codebook_usage: 0.9761, generator_adv_loss: 0.0698, disc_adaptive_weight: 1.0000, disc_weight: 0.5000 (Discriminator) discriminator_adv_loss: 0.4980, disc_weight: 0.5000, logits_real: -0.1318, logits_fake: -0.1396

To be a good discriminator, logits_real should be close to 1, while logits_fake should be close to -1, right? Can you share me your training log regarding these logits?

PeizeSun commented 5 months ago

Hi~ This is exactly what the discriminator should be.

If logits_real is very close to 1, and logits_fake is vey close to -1, the adversarial process is dominated by discriminator, generator can't benefit from discriminator.

ThisisBillhe commented 5 months ago

Hi peize,

Thanks for your reply! I understand that the discriminator will be too strong if logits_real is close to 1, and logits_fake is close to -1. But shouldn't they be one positive and the other negative? If they have very similar values (like logits_real: -0.1318, logits_fake: -0.1396), to my understanding, the discriminator can not discriminate at all?

PeizeSun commented 5 months ago

The beauty of adversarial process is that logits_fake and logits_real are similar. I highly recommend to run other GAN codebases to see this magic phenomena.

ThisisBillhe commented 5 months ago

Thanks for your help!