carpedm20 / BEGAN-tensorflow

Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"
Apache License 2.0
923 stars 241 forks source link

The quality of generative images is lower than the paper's result? #1

Open zhangqianhui opened 7 years ago

zhangqianhui commented 7 years ago

Really?

carpedm20 commented 7 years ago

I just received some comment on learning rate which was different from the code which makes the images in README.md. I'll update the result and authors will update and clarify implementation details near future.

zhangqianhui commented 7 years ago

ok.

artcg commented 7 years ago

You shouldnt expect to necessarily re-create same quality as the paper anyway since the paper uses a not released to public dataset (which is larger and more diverse than celebA)

carpedm20 commented 7 years ago

@artcg I thought they only used CelebA which is public. Anyway, thanks to your comment, I just found that they use totally different crop (more face focused) and it should be definitely helpful to reproduce the result more clearly.

artcg commented 7 years ago

Nope from the paper 'We use a dataset of 360k celebrity face images for training in place of CelebA'

I.e. Instead of using the standard CelebA they are using this larger dataset with different cropping / alignment like you say

But that is a good point maybe different pre-processing on CelebA would improve results!

On 7 Apr 2017, at 4:37 p.m., Taehoon Kim notifications@github.com wrote:

@artcg I thought they only used CelebA which is public. Anyway, thanks to your comment, I just found that they use totally different crop (more face focused) and it should be definitely helpful to reproduce the result more clearly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

carpedm20 commented 7 years ago

There was a tf.slim related mistake as discussed in #3 and now the results are way better as this. However, 128x128 image for CelebA doesn't goes well with the default hyper-parameters (only 64x64 trained well).

surya501 commented 7 years ago

I tried this on the zappos dataset. Hence, I don't have a baseline comparison to compare against. However, my feeling is that the output could be better and definitely not at the same level as the faces in the paper. Some output for reference below. This is after 500k iterations over ~50 hours of training on GTX 1080 test8_g_z test9_g_z

fwiffo commented 7 years ago

That Zappos example looks like mode collapse. It's producing many examples of a small number of shoes. Different datasets require different tuning of hyperparameters so I would suggest lowering the learning rate or adjusting the size of z and/or h.

surya501 commented 7 years ago

Thanks @fwiffo. Any rule of thumb/heuristics on which to try first... especially because the runtimes are long.

fwiffo commented 7 years ago

Start with learning rate. Reduce it by half, and/or decay it more frequently during training. Also, you don't need to train for that many steps to see if it's going to mode collapse; it should do that in the first 100k steps. Next try adjusting h. For faces the quality was significantly worse with 32, and somewhat worse with 128, but this probably depends on the diversity of the dataset. The size of z seems to have much less effect; anywhere from 32-1024 seemed similar in our tests. Started to see reduced diversity and quality below 32, but it may depend on the dataset.

Smaller datasets may also cause trouble (zappos is 50k which isn't too bad), so you can try the usual tricks, e.g. randomly offset or rotate the input images slightly.