Differences in implementation compared to paper

ZahlGraf commented 6 years ago

Hi

I read your paper about MARTA-GAN and found some differences inside the paper with the current implementation:

In chapter II, section B you explain the loss of the generator, which is a combination of the perceptual loss and the feature matching loss. The feature matching loss in described by formula 4.

It is like:

Loss = || E(f(x)) + E(f(G(z))) ||²

First of all, I think you interchanged the + with a -, since you want the expected values to match and not to add those values. This is also reflected inside your code where the loss is calculated like this:

g_loss2 = tf.reduce_mean(tf.nn.l2_loss(feature_real-feature_fake))/(FLAGS.image_size*FLAGS.image_size)

However furthermore you do not calculate the expected values over the features from the real and fake image, but you calculate the mean value of the L2 loss from the real and fake activation. The mean value is just an estimate of the expected value calculated over the batch, thus mathematically this is not the same. Right?

Furthermore you divide the mean-value by the squared image-size. I think you did it to make this loss value independent of the image size, which makes sense, since the perceptual loss is also independent of the image size.

But why did you chose exactly this value? Did you try different values to weight the g_loss2 differently compared to g_loss1. Maybe you would get a better representation by increasing the weight of g_loss2?

Have you also tried other metrics than the mean value of the activation? For example the standard deviation of the activation? Maybe you would get a better representation, if not only the mean value of the fake features matches, but also the standard deviation?

In general: Thank you for your paper, it was fun to read and it inspires me for my own work at university.

BUPTLdy commented 6 years ago

Sorry for the later reply and thank you careful check. Yes, we made a huge mistake about the formula 4, it should be '-' instead of '+', we will update it in the arxiv.

And about the mean over the batch maybe you can find answer here: https://stats.stackexchange.com/questions/201452/is-it-common-practice-to-minimize-the-mean-loss-over-the-batches-instead-of-the

And about divide the mean-value by the squared image-size and the weight of the g_loss2 is actually the same thing, is a scalar about g_loss2 how to affect g_loss. We have tried different weight about g_loss2 compared to g_loss1, but the accuracy doesn't vary too much. Actually, as the figure 4 shown, add g_loss1 only add a little bit accuracy compare to only g_loss1.

Thank again for your careful check.

ZahlGraf commented 6 years ago

Thanks for the link and for updating the paper.

BUPTLdy / MARTA-GAN

Differences in implementation compared to paper #1