Closed rardz closed 6 years ago
You're right actually, how did I not notice this before >_< . That's what happens when you are writing a paper solo I guess. I'll have to redo analyses with the correct version, it might take a few weeks, my GPU is not great.
So basically, RaHingeGAN is correct but RaSGAN and RaLSGAN are not. They still make sense at least, they are not fundamentally wrong, its just that they are always comparing multiple fakes with a single real.
Hi, just wondering if the results on the paper are updated according to corrections, especially RaLSGAN since from the paper it seems to be a superior loss function generally and I'm considering to use it but couldn't make sure after seeing this issue. Thanks.
The error corresponded to RalfGANs (scaled by 2) from https://arxiv.org/pdf/1901.02474.pdf. There's no significant difference between RaGANs and RalfGANs, I tested both. I also report the lack of difference here: https://wordpress.com/post/ajolicoeur.wordpress.com/267.
So feel free to use either since both work equally well, but the official one I use is RaGAN (the updated one).
Hi Alexia, congrats on getting a solo paper submission accepted, and thank you for this repo! I have a question about your hinge loss implementation that I think fits into this issue-thread:
How do you get to a generator loss of
errG = (torch.mean(torch.nn.ReLU()(1.0 + (y_pred - torch.mean(y_pred_fake)))) + torch.mean(torch.nn.ReLU()(1.0 - (y_pred_fake - torch.mean(y_pred)))))/2
?
From what I've read the standard generator hinge loss is just
- torch.mean(y_pred_fake)
,
thus the relativistic version should be
- torch.mean(y_pred_fake-torch.mean(y_pred))
Is there some mathematical equivalence to your version that I'm missing? Would be great if you could clarify! Thx
Hi @stefsietz,
I don't know the logic for the torch.mean(y_pred_fake) in the normal Hinge loss, but what I know is that generally changing the loss function of the generator doesn't change much (something I first observed, but did not well understand, in my first paper https://arxiv.org/pdf/1809.02145.pdf). Now that I know more, I can say that for non-relativistic GANs, the discriminator always estimate a function of the density ratio q(x)/p(x) (see f-GAN paper). And basically, all that matters is that the generator loss makes the generator increase q(x)/p(x); the loss of G or the divergence induced by the loss of G doesn't really matter. With relativistic GANs there is no density ratio and closed-form, but the logic that the choice of loss for G doesn't matter much should still apply.
I prefer to stay in the relativistic framework and use the loss G= (torch.mean(torch.nn.ReLU()(1.0 + (y_pred - torch.mean(y_pred_fake)))) + torch.mean(torch.nn.ReLU()(1.0 - (y_pred_fake - torch.mean(y_pred)))))/2
Btw, - torch.mean(y_pred_fake-torch.mean(y_pred))
is equivalent to - torch.mean(y_pred_fake)
because the gradient of y_pred is zero wrt G.
Ok, thanks for the quick reply! I just recently started training GAN's and honestly I don't have a good theoretical understanding of the different loss functions yet, so your explanation is really helpful.
take RaGAN for example, according to Algorithm 2 of the paper, the loss of RaGAN should be:
while yours be:
why?