Open ghost opened 6 years ago
Sure.
gen_loss_L1 is simply the difference between your actual training and your goal. Actually it is the mean reduce of it. It gives good results easily.
gen_loss_L1 = tf.reduce_mean(tf.abs(targets - outputs))
discrim_loss and gen_loss_GAN are fighting against each other. This is a mathematical artifact. discrim_loss is the measure of the training that aims to identify outputs as fakes. That measure is used to train what we call the discriminator.
discrim_loss = tf.reduce_mean(-(tf.log(predict_real + EPS) + tf.log(1 - predict_fake + EPS)))
While gen_loss_GAN is the measure of the training that aims to identify outputs as real.
gen_loss_GAN = tf.reduce_mean(-tf.log(predict_fake + EPS))
Both gen_loss_GAN and gen_loss_L1 are combined to train what we call the generator. Deep learning is somehow finding a needle in a haystack. gen_loss_L1 gives a simple way that converges fast. But GAN can helps to go further in image details that gen_loss_L1 will hardly discover.
The combination used by default in affinelayer/pix2pix.py is something like 100*gen_loss_L1 + gan. That means in a first phase, gen_loss_L1 will decrease, and probably gan will increase. But then, when gen_loss_L1~gan you will probably see gan start decreasing (and so discrim_loss will increase). That means the generator starts winning over the discriminator. And so to wait.
Have a look on my tensorboard : http://163.172.41.53:6006/. I am near 400 epoch for 2683 sample size.
Good luck.
@julien2512 I notice from your tensorboard, some blurring in your outputs i.e., the edges aren't very sharp even after so many iterations. Especially the keybaord region. Do you know of a way to improve that?
I have to try so many other optimisers and gradients formula ! By now, I was trying to show if and how far information can be used in the convolutional network. And that's working pretty well. You always can find the neural network your aiming at with an infinite time. But I think I am totally blind if I keep at thinking I can zoomout an image that way. Image to image may be the starting point, but the key is an understanding of the way the information is to be transformed however.
To answer the question an easiest way : I do not know right now, and I think time or money are not the good answers.
julien2512, your explanation to query was really helpful. Could you also make explanation of list below ? And what was your assumptions about default values for those parameters? List : parser.add_argument("--lr", type=float, default=0.0002, help="initial learning rate for adam") parser.add_argument("--beta1", type=float, default=0.5, help="momentum term of adam") parser.add_argument("--l1_weight", type=float, default=100.0, help="weight on L1 term for generator gradient") parser.add_argument("--gan_weight", type=float, default=1.0, help="weight on GAN term for generator gradient") parser.add_argument("--ngf", type=int, default=64, help="number of generator filters in first conv layer") parser.add_argument("--ndf", type=int, default=64, help="number of discriminator filters in first conv layer")
My main goal is to get more accurate generator results. Currently my main intention is changing max-epoch number and maybe decreasing initial learning rate ?
Could you please suggest something ?!
Thanks!
Hi,
My main goal is to get more accurate generator results. Currently my main intention is changing max-epoch number and maybe decreasing initial learning rate ?
When you get to a plateau, max-epoch won't really help you.
Adam optimizer papers tells it is better to start with a higher initial learning rate, then to lower it. Like if your running on the place where you loose something, then you start go very slowly to find it.
parser.add_argument("--lr", type=float, default=0.0002, help="initial learning rate for adam") parser.add_argument("--beta1", type=float, default=0.5, help="momentum term of adam")
lr and beta1 are parameters relatives to the Adam optimizer (another version of the standard gradient procedures. Google says : https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/).
Adam papers used 0.9 instead of 0.5 of beta1. They said it works better for their dataset.
You should also try other optimisers that work better than Adam.
NDR : I need to experiment more from myself rather than telling other to do :) :) :)
parser.add_argument("--l1_weight", type=float, default=100.0, help="weight on L1 term for generator gradient") parser.add_argument("--gan_weight", type=float, default=1.0, help="weight on GAN term for generator gradient")
By default, the weight applyed to the gradient is
100*gen_loss_L1 + gan
but l1_weight and gan_weight are used to make that weight like :
l1_weight*gen_loss_L1 + gan_weight*gan
so that you can adjust it to your data.
gan is better for details.
L1 is a very easy way to compare images. There must be better ways for your own dataset. gan is something like learning a network to compare these images ...
parser.add_argument("--ngf", type=int, default=64, help="number of generator filters in first conv layer") parser.add_argument("--ndf", type=int, default=64, help="number of discriminator filters in first conv layer")
ngf and ndf are - kind of - the number of conv layers and deconv layers. conv layers aims to identify things in a scene. deconv layers aims to make a scene from identified stuff. google says : https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/ "The more filters, the greater the depth of the activation map, and the more information we have about the input volume."
Regards,
P.S. : I need to say, I am learning a bit more every time somebody asking a question. ;)
Hi, Why are the calculations for discrim_loss
and gen_loss_GAN
not the same as in the original paper (https://arxiv.org/pdf/1611.07004.pdf )?
Specifically, in discrim_loss = tf.reduce_mean(-(tf.log(predict_real + EPS) + tf.log(1 - predict_fake + EPS)))
, why is the first term subtracted and the second one added? The Apple paper on Learning from Simulated and Unsupervised Images through Adversarial Training (https://arxiv.org/abs/1612.07828) involves the minimization of a similar cost function for its discriminator network, and both terms are subtracted. In the original GANs paper (https://arxiv.org/abs/1406.2661), the cost function of the discriminator network is a sum of the two.
Then, for gen_loss_GAN, the code says ` # predict_fake => 1
gen_loss_GAN = tf.reduce_mean(-tf.log(predict_fake + EPS)).`
Why is the predict_fake going towards 1 now? EDIT: Oh! I guess you'd expect the predict_fake to go to 1 as the discriminator approaches towards failing to discriminate between what's fake and what's real. Still if someone could explain this better. That'd be great.
-Thank you for your time. -Mimi
Specifically, in discrim_loss = tf.reduce_mean(-(tf.log(predict_real + EPS) + tf.log(1 - predict_fake + EPS))), why is the first term subtracted and the second one added?
You are wrong. See the parenthesis. We expect the logarithm to be negative because predict_real & 1-predict_fake are under 1. So to set the minus operator first.
To follow the paper we should
tf.reduce_mean(-tf.log(predict_real + EPS) - tf.log(1 - predict_fake + EPS))
that leads to the same results.
Oops. Thanks for writing. Makes sense now.
If it's the same than why it was needed to make it reverse ?
@Lienes I don't know about your last question, but to answer a previous one :
My main goal is to get more accurate generator results. Currently my main intention is changing max-epoch number and maybe decreasing initial learning rate ?
If you think you miss the result because of a too high step, yes, decreasing the initial learning rate would be a good idea.
There might be many things to do to improve your learning :
As we know a convolutional network is best at classify images, there will also be specific networks to learn your own data.
Thanks for the fast response. Some of those steps you mentioned had already been done and it helps a little. :) To get even better results, i need more knowledge base about mashine learning and convnets.
Hi @julien2512 thank you again for helping us. Can you suggest me which parameters should I change, to obtain these particular goals:
1 - I would like to obtain more realistic images. 2- I don't care about the similarity (difference) between the single output image and the relative goal image. (gen_loss_L1).
For example, in the DAY to NIGHT scene transformation, I don't care if the networks add a car that does't exist, or change the color of a house, or anything, but the result must seem realistic for an human eye. So I would like a more "imaginative" network, that can be wrong recreating a scene, but maintain a realistic appeal.
Thank you
Hi @fabio-C
How less L1 did you get for now ?
Parameters may be not the only thing to change in order to get better results ! You probably need to experiment. If you reach that level you will get help reading study cases like https://arxiv.org/pdf/1804.07723.pdf
Reading that one helps me a lot to understand : http://colah.github.io/posts/2014-10-Visualizing-MNIST/
At my level of understanding, my best advise for you is to use something like 50k images for your experiments. And try to aim L1=0.06 or less for 10% test images. You should compare several models and parameters.
And if you made something better than nvidia : https://www.digitaltrends.com/cool-tech/nvidia-ai-winter-summer-car/ share it to the world !
Regards, Julien.
@julien2512 ,Thank you for your detailed and patient explanation. And thank your for sharing your tensorboard. After seeing your tensorboard, I found something interesting, and I cannot explain. Could you please help me out?
ema = tf.train.ExponentialMovingAverage(decay=0.99)
. Results are completely different. Result that using tf.train.ExponentialMovingAverage
is shown in Fig. 1. And result not using tf.train.ExponentialMovingAverage
is shown in Fig. 2.
As shown in Fig. 1 and Fig. 2, "gen_loss_l1" increased from zero in Fig. 1, while "gen_loss_l1" decreased from non-zero in Fig. 2.
These results make me confused about ExponentialMovingAverage
. Could you please explain the reason?I'd appreciate if you could give some hint. Thank you!
Fig. 1
Fig. 2
Hi @leeeeeeo
Yes maybe discriminator_loss_L1 loose a bit sometime. I think it comes from my data. I am working on an other set.
It is possible to reduce every summary, display, trace, progress and save frequencies.
Good. I made it too ! Average moving reduce local disparities, but "local" won't apply to the beginning !
Regards
Is not it correct to calculate L1 distance(gen_loss_L1) as follows?
gen_loss_L1 = tf.reduce_sum(tf.abs(targets - outputs))
tf.reduce_mean -> tf.reduce_sum
@ichae if you divide it with you matrix size : it will be the mean.
which one of these losses is better? @julien2512
which one of these losses is better? @julien2512 I think all of them are not good. as D is so strong and G is weak
A lot of water under the bridge from 2018 ! Have a look : https://www.tensorflow.org/tutorials/generative/pix2pix
Hi, could someone explain me what the three different losses mean? And which I should care at most in order to obtain good quality pics.
In particular:
Thank you