PatchGAN loss calculated incorrectly (perhaps)

aitorzip / PyTorch-CycleGAN

A clean and readable Pytorch implementation of CycleGAN

https://arxiv.org/abs/1703.10593

GNU General Public License v3.0

1.19k stars 283 forks source link

PatchGAN loss calculated incorrectly (perhaps) #26

Open alexander-soare opened 4 years ago

alexander-soare commented 4 years ago

Thanks for the great work @aitorzip I think I've spotted a mistake in the way the GAN loss is calculated.

In the forward of your Discriminator class you average the activations first, and then you calculate the loss based on the resulting value. Actually, I think you should calculate the per neuron loss first then average the result of that. When you do the math one way vs the other you will find the losses are not the same.

I noticed you haven't committed since 2017 but if you see this and agree I'd be happy to make a PR.

naga-karthik commented 3 years ago

@alexander-soare I was wondering whether if it makes too much of a difference in the results? Have you tried comparing the two?

alexander-soare commented 3 years ago

@naga-karthik it's been a while since but I think I remember determining that the impact was equivalent to a scalar multiple of the gradient, therefore it only affects the learning rate parameter. Don't take me word for it though as it's been a while.

jbeck9 commented 2 years ago

@naga-karthik it's been a while since but I think I remember determining that the impact was equivalent to a scalar multiple of the gradient, therefore it only affects the learning rate parameter. Don't take me word for it though as it's been a while.

Late to the party here but you are correct. This really needs to be changed. The current implementation misses the point of the patchGAN paper entirely. If you are going to average the patches before performing mse, you might has well add a dense layer and make a global discriminator because you just obscured all the local information.

Unfortunately, this is not just a matter of scaling the learning rate. This definitely affects the gradient propogation.