DarthSid95 / RumiGANs

Code accompanying the NeurIPS 2020 submission "Teaching a GAN What Not to Learn."
MIT License
32 stars 5 forks source link

LSGAN implementation #3

Open kiddj opened 3 years ago

kiddj commented 3 years ago

Thank you for your great efforts!

I tried to reproduce your results for LSGAN, however, there is some mismatch between the paper and this repo. In lsgan.py, the loss function of LSGAN_RumiGAN is implemented as:

D_real_pos_loss = self.alphap * mse(self.label_bp*tf.ones_like(self.real_pos_output), self.real_pos_output)
D_real_neg_loss = self.alphan * mse(self.label_bn*tf.ones_like(self.real_neg_output), self.real_neg_output)
D_fake_loss = mse(self.label_a*tf.ones_like(self.fake_output), self.fake_output)
self.D_loss = D_real_pos_loss + D_real_neg_loss + D_fake_loss 

G_fake_loss = mse(self.label_c*tf.ones_like(self.fake_output), self.fake_output)
G_real_neg_loss = mse(self.label_c*tf.ones_like(self.real_neg_output), self.real_neg_output)
self.G_loss = G_fake_loss + G_real_neg_loss + D_real_pos_loss

However, the loss of generator with the real_pos_output must be clearly different from D_real_pos_loss. (It should use label_c instead of label_bp) Also, alphap and alpnan are used for D_loss, even though Beta+ and Beta- are introduced for both G and D in the original paper.

Are these just bugs? How can I reproduce the LSGAN as the paper?

DarthSid95 commented 3 years ago

You're technically right. D_real_pos found in G_fake_loss must be using label_c. However, in practice, that loss is computed on target images, and not images from the generator. So from a back-propagation standpoint, it makes no difference. That term doesn't contribute meaningful gradients to training the generator weights. To reproduce the results, you could event comment out all "real" terms and use only:

self.G_loss = G_fake_loss

and still have the same effect in training. (All this is true for G_real_neg_loss terms too, actually... Maybe I just forgot to create a G_real_pos_less). Nonetheless, I will fix it to have label_c, and recommit to make it consistent with the paper's presented loss function.

Coming to alphas, I was merely lazy and used the same flag variables. Loading the alpha_n and alpha_p flags with \beta^+ and \beta^- values when using LSGAN was the intention, but I can add specific beta flags if that improves readability.

To reproduce the Rumi-LSGAN as in the paper, incorporating the loss function, and data-parsing pipelines into your own code should just as well be sufficient. For the paper's Supplementary's results, I set (a, b^-, c, b^+ ) to (0, 0.5, 1, 2) but I found a variety of other combinations to also be satisfactory. For the above, beta^+ needed to be 1/3 and beta^- could be anything. The scaling of betas affect \int_{\mathcal{X}} p_g^*(x) dx in theory, however in practice, you could scale them without much effect (Essentiay, one could argue that the effect in consumed into the learning rates) and set beta^+ to 1 and beta^- to 0.25 . These have worked for me, but you could definitely try playing around with labels and betas.