affinelayer / pix2pix-tensorflow

Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets https://phillipi.github.io/pix2pix/
MIT License
5.07k stars 1.3k forks source link

How to get a good model, or how to judge whether I trained a good model? #57

Open knaffe opened 7 years ago

knaffe commented 7 years ago

I use my own data(face images) with different epochs and batch size to train some models. And then I have a look on the discriminator_loss and generator_loss on Tensorboard. With the setting: batchsize= 60, epochs = 50, the discriminator_loss would increase while generator_loss drops down. It makes sense based on my limited knowledge of GAN. About the test results, my test image results with the validation images dataset are worse than the images generated while training. Why ? As a result, I want to find out what has happened. I set the batchsize = 4 or 30,, epochs=15, 50 or 100, the discriminator_loss would drop down while the generator_loss increases. Meanwhile, the generator_loss is greater than discriminator_loss。 So, my questions are:

  1. How to judge whether I get a good model? Should I judge it only depended on the loss ?
  2. How could I get a good model ?
DanqingZ commented 7 years ago

I think we need to add the validation loss into the model, which is the metrics to check if the model is performing well.

dustyYMelody7 commented 6 years ago

@DanqingZ Do you have added validation loss into this project? Can you give me some advice about how to change code to realize it?

tonmoyborah commented 5 years ago

@knaffe Interesting, I didn't know batch size can effect this in such a way. In my experience, a good model should be judged by the test images and mostly depends on the training data. Having quality data and run for many epochs (I got good results for 500, 1000 epochs).

julien2512 commented 5 years ago

batch_size have an influence on the learning. Within the batch_size, all gradients are sumed up and divided by the batch_size to get the final gradient.

This will be faster than to use a stochastic approach, but need more memory and cores at a time. It can also sometimes leads to bad results when final gradient are worse than individual ones.

That's why you need to try different batch sizes and random orders.