ghost commented 6 years ago

Hi, could someone explain me what the three different losses mean? And which I should care at most in order to obtain good quality pics.

In particular:

discrim_loss
gen_loss_GAN
gen_loss_L1

Thank you

julien2512 commented 6 years ago

Sure.

gen_loss_L1 is simply the difference between your actual training and your goal. Actually it is the mean reduce of it. It gives good results easily.

gen_loss_L1 = tf.reduce_mean(tf.abs(targets - outputs))

discrim_loss and gen_loss_GAN are fighting against each other. This is a mathematical artifact. discrim_loss is the measure of the training that aims to identify outputs as fakes. That measure is used to train what we call the discriminator.

discrim_loss = tf.reduce_mean(-(tf.log(predict_real + EPS) + tf.log(1 - predict_fake + EPS)))

While gen_loss_GAN is the measure of the training that aims to identify outputs as real.

gen_loss_GAN = tf.reduce_mean(-tf.log(predict_fake + EPS))

Both gen_loss_GAN and gen_loss_L1 are combined to train what we call the generator. Deep learning is somehow finding a needle in a haystack. gen_loss_L1 gives a simple way that converges fast. But GAN can helps to go further in image details that gen_loss_L1 will hardly discover.

The combination used by default in affinelayer/pix2pix.py is something like 100*gen_loss_L1 + gan. That means in a first phase, gen_loss_L1 will decrease, and probably gan will increase. But then, when gen_loss_L1~gan you will probably see gan start decreasing (and so discrim_loss will increase). That means the generator starts winning over the discriminator. And so to wait.

Have a look on my tensorboard : http://163.172.41.53:6006/. I am near 400 epoch for 2683 sample size.

Good luck.

groot-1313 commented 6 years ago

@julien2512 I notice from your tensorboard, some blurring in your outputs i.e., the edges aren't very sharp even after so many iterations. Especially the keybaord region. Do you know of a way to improve that?

julien2512 commented 6 years ago

I have to try so many other optimisers and gradients formula ! By now, I was trying to show if and how far information can be used in the convolutional network. And that's working pretty well. You always can find the neural network your aiming at with an infinite time. But I think I am totally blind if I keep at thinking I can zoomout an image that way. Image to image may be the starting point, but the key is an understanding of the way the information is to be transformed however.

To answer the question an easiest way : I do not know right now, and I think time or money are not the good answers.

Lienes commented 6 years ago

julien2512, your explanation to query was really helpful. Could you also make explanation of list below ? And what was your assumptions about default values for those parameters? List : parser.add_argument("--lr", type=float, default=0.0002, help="initial learning rate for adam") parser.add_argument("--beta1", type=float, default=0.5, help="momentum term of adam") parser.add_argument("--l1_weight", type=float, default=100.0, help="weight on L1 term for generator gradient") parser.add_argument("--gan_weight", type=float, default=1.0, help="weight on GAN term for generator gradient") parser.add_argument("--ngf", type=int, default=64, help="number of generator filters in first conv layer") parser.add_argument("--ndf", type=int, default=64, help="number of discriminator filters in first conv layer")

My main goal is to get more accurate generator results. Currently my main intention is changing max-epoch number and maybe decreasing initial learning rate ?

Could you please suggest something ?!

Thanks!

julien2512 commented 6 years ago

Hi,

My main goal is to get more accurate generator results. Currently my main intention is changing max-epoch number and maybe decreasing initial learning rate ?

When you get to a plateau, max-epoch won't really help you.

Adam optimizer papers tells it is better to start with a higher initial learning rate, then to lower it. Like if your running on the place where you loose something, then you start go very slowly to find it.

parser.add_argument("--lr", type=float, default=0.0002, help="initial learning rate for adam") parser.add_argument("--beta1", type=float, default=0.5, help="momentum term of adam")

lr and beta1 are parameters relatives to the Adam optimizer (another version of the standard gradient procedures. Google says : https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/).

Adam papers used 0.9 instead of 0.5 of beta1. They said it works better for their dataset.

You should also try other optimisers that work better than Adam.

NDR : I need to experiment more from myself rather than telling other to do :) :) :)

parser.add_argument("--l1_weight", type=float, default=100.0, help="weight on L1 term for generator gradient") parser.add_argument("--gan_weight", type=float, default=1.0, help="weight on GAN term for generator gradient")

By default, the weight applyed to the gradient is

100*gen_loss_L1 + gan

but l1_weight and gan_weight are used to make that weight like :

l1_weight*gen_loss_L1 + gan_weight*gan

so that you can adjust it to your data.

gan is better for details.

L1 is a very easy way to compare images. There must be better ways for your own dataset. gan is something like learning a network to compare these images ...

parser.add_argument("--ngf", type=int, default=64, help="number of generator filters in first conv layer") parser.add_argument("--ndf", type=int, default=64, help="number of discriminator filters in first conv layer")

ngf and ndf are - kind of - the number of conv layers and deconv layers. conv layers aims to identify things in a scene. deconv layers aims to make a scene from identified stuff. google says : https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/ "The more filters, the greater the depth of the activation map, and the more information we have about the input volume."

Regards,

P.S. : I need to say, I am learning a bit more every time somebody asking a question. ;)

mszeto715 commented 6 years ago

Hi, Why are the calculations for discrim_loss and gen_loss_GAN not the same as in the original paper (https://arxiv.org/pdf/1611.07004.pdf )?

Specifically, in discrim_loss = tf.reduce_mean(-(tf.log(predict_real + EPS) + tf.log(1 - predict_fake + EPS))), why is the first term subtracted and the second one added? The Apple paper on Learning from Simulated and Unsupervised Images through Adversarial Training (https://arxiv.org/abs/1612.07828) involves the minimization of a similar cost function for its discriminator network, and both terms are subtracted. In the original GANs paper (https://arxiv.org/abs/1406.2661), the cost function of the discriminator network is a sum of the two.

Then, for gen_loss_GAN, the code says ` # predict_fake => 1

abs(targets - outputs) => 0

    gen_loss_GAN = tf.reduce_mean(-tf.log(predict_fake + EPS)).`

Why is the predict_fake going towards 1 now? EDIT: Oh! I guess you'd expect the predict_fake to go to 1 as the discriminator approaches towards failing to discriminate between what's fake and what's real. Still if someone could explain this better. That'd be great.

-Thank you for your time. -Mimi

julien2512 commented 6 years ago

Specifically, in discrim_loss = tf.reduce_mean(-(tf.log(predict_real + EPS) + tf.log(1 - predict_fake + EPS))), why is the first term subtracted and the second one added?

You are wrong. See the parenthesis. We expect the logarithm to be negative because predict_real & 1-predict_fake are under 1. So to set the minus operator first.

To follow the paper we should tf.reduce_mean(-tf.log(predict_real + EPS) - tf.log(1 - predict_fake + EPS)) that leads to the same results.

mszeto715 commented 6 years ago

Oops. Thanks for writing. Makes sense now.

Lienes commented 6 years ago

If it's the same than why it was needed to make it reverse ?

julien2512 commented 6 years ago

@Lienes I don't know about your last question, but to answer a previous one :

My main goal is to get more accurate generator results. Currently my main intention is changing max-epoch number and maybe decreasing initial learning rate ?

If you think you miss the result because of a too high step, yes, decreasing the initial learning rate would be a good idea.

There might be many things to do to improve your learning :

increase layers (unless parameters values are near 0 that in most case mean you should lower layers)
change activation function
change optimizer
improve your dataset (to be more representative)
...

As we know a convolutional network is best at classify images, there will also be specific networks to learn your own data.

Lienes commented 6 years ago

Thanks for the fast response. Some of those steps you mentioned had already been done and it helps a little. :) To get even better results, i need more knowledge base about mashine learning and convnets.

ghost commented 6 years ago

Hi @julien2512 thank you again for helping us. Can you suggest me which parameters should I change, to obtain these particular goals:

1 - I would like to obtain more realistic images. 2- I don't care about the similarity (difference) between the single output image and the relative goal image. (gen_loss_L1).

For example, in the DAY to NIGHT scene transformation, I don't care if the networks add a car that does't exist, or change the color of a house, or anything, but the result must seem realistic for an human eye. So I would like a more "imaginative" network, that can be wrong recreating a scene, but maintain a realistic appeal.

Thank you

julien2512 commented 6 years ago

Hi @fabio-C

How less L1 did you get for now ?

Parameters may be not the only thing to change in order to get better results ! You probably need to experiment. If you reach that level you will get help reading study cases like https://arxiv.org/pdf/1804.07723.pdf

Reading that one helps me a lot to understand : http://colah.github.io/posts/2014-10-Visualizing-MNIST/

At my level of understanding, my best advise for you is to use something like 50k images for your experiments. And try to aim L1=0.06 or less for 10% test images. You should compare several models and parameters.

And if you made something better than nvidia : https://www.digitaltrends.com/cool-tech/nvidia-ai-winter-summer-car/ share it to the world !

Regards, Julien.

leeeeeeo commented 5 years ago

@julien2512 ,Thank you for your detailed and patient explanation. And thank your for sharing your tensorboard. After seeing your tensorboard, I found something interesting, and I cannot explain. Could you please help me out?

Comparing with your tensorboard-scalar and docs/tensorboard-scalar.png , I notice that the "discriminator_loss_1" in docs/tensorboard-scalar.png decreased from the beginning, however, your "discriminator_loss_1" increased from a very small value (which is similar to mine, shown in Fig. 1, mine is more strange, it starts from 0.00), and then decreased. Why is the difference?
Your curves are much more noisy, while docs/tensorboard-scalar.png is more smooth, Why is that?
I made a small test, training with and without ema = tf.train.ExponentialMovingAverage(decay=0.99). Results are completely different. Result that using tf.train.ExponentialMovingAverage is shown in Fig. 1. And result not using tf.train.ExponentialMovingAverage is shown in Fig. 2. As shown in Fig. 1 and Fig. 2, "gen_loss_l1" increased from zero in Fig. 1, while "gen_loss_l1" decreased from non-zero in Fig. 2. These results make me confused about ExponentialMovingAverage. Could you please explain the reason?

I'd appreciate if you could give some hint. Thank you!

screenshot from 2018-12-19 14-43-15 Fig. 1

screenshot from 2018-12-19 14-51-45 Fig. 2

julien2512 commented 5 years ago

Hi @leeeeeeo

may be from my previous one (I don't remember), by my current learning got similarity with @christopherhesse's. Have a look one more time : http://163.172.41.53:6006/

Yes maybe discriminator_loss_L1 loose a bit sometime. I think it comes from my data. I am working on an other set.

It is possible to reduce every summary, display, trace, progress and save frequencies.
Good. I made it too ! Average moving reduce local disparities, but "local" won't apply to the beginning !

Regards