davidADSP / Generative_Deep_Learning_2nd_Edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.
https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/
Apache License 2.0
990 stars 363 forks source link

Not same results as book in WGAN notebook #18

Open CyberDuck79 opened 12 months ago

CyberDuck79 commented 12 months ago

There are examples in the book of what the WGAN should generate after 25 epochs of training.

Capture d’écran 2023-08-06 à 17 16 19

But when I train the model, these are the generated samples at the 25th epoch of training.

Capture d’écran 2023-08-06 à 17 12 28

I tried to change many hyperparameters (e.g., learning rate), but I never succeeded in getting a model that generates the type of faces in the book example, even after the 200 training epochs.

It looks similar to issue #13, and I was thinking that if, in the DCGAN chapter, I never succeeded in getting a model that generates LEGO, it was because of the problems of GANs explained later in the chapter. But it seems to be another problem common to the two chapters.

Has anyone succeeded in obtaining a good model?

tctr commented 8 months ago

Same here : the wgan proposed code does not converge to realistic faces, at least in 200 epochs as I tried according to the notebook parameters.

KirkDCO commented 7 months ago

I was having similar issues and increased the latent dimension to 192 to get this in 25 epochs using the CelebA dataset.

image

It is a bit out of context here, but this code snip gives an idea of the critic network structure:

disc_layers = [WGAN.CriticLayer(num_filters = 32, kernel_size = 4, strides = 2, padding = 'same', bias = True, neg_slope = 0.1, dropout_rate = 0.3),
               WGAN.CriticLayer(num_filters = 64, kernel_size = 4, strides = 2, padding = 'same', bias = True, neg_slope = 0.1, dropout_rate = 0.3),
               WGAN.CriticLayer(num_filters = 128, kernel_size = 4, strides = 2, padding = 'same', bias = True, neg_slope = 0.1, dropout_rate = 0.3),
               WGAN.CriticLayer(num_filters = 256, kernel_size = 4, strides = 2, padding = 'same', bias = True, neg_slope = 0.1, dropout_rate = 0.3),
               WGAN.CriticLayer(num_filters = 1, kernel_size = 4, strides = 1, padding = 'valid', bias = True, neg_slope = 0.1, dropout_rate = 0.3)]

and this gives an idea of the generator structure:

gen_layers =  [WGAN.GeneratorLayer(num_filters = 256, kernel_size = 4, strides = 4, padding = 'same', bias = True, neg_slope = 0.1),
                  WGAN.GeneratorLayer(num_filters = 128, kernel_size = 4, strides = 2, padding ='same', bias = True, neg_slope = 0.1,),
                  WGAN.GeneratorLayer(num_filters = 64, kernel_size = 4, strides = 2, padding ='same', bias = True, neg_slope = 0.1,),
                  WGAN.GeneratorLayer(num_filters = 32, kernel_size = 4, strides = 2, padding ='same', bias = True, neg_slope = 0.1,),
                  WGAN.GeneratorLayer(num_filters = IMG_CHANNELS, kernel_size = 4, strides = 2, padding = 'same', bias = True, neg_slope = 0.1)]

I'm continuing to experiment with the WGAN and will post what I find here.

tctr commented 7 months ago

I was having similar issues and increased the latent dimension to 192 to get this in 25 epochs using the CelebA dataset. ... I'm continuing to experiment with the WGAN and will post what I find here.

Great ! :)

KirkDCO commented 7 months ago

Following up on my previous comment....

I was having similar issues with not getting good images compared to those in the book. My code has a very similar architecture with a very different way of building the WGAN - I wanted to implement on my own as much as possible to reinforce what I've been learning. A couple of differences in my code compared to the book:

I've tried different latent space sizes - 100, 128, 192, and 512. All produced images the were comparable to the book. Below are some examples from each of these sizes.

If you're interested in looking at my code, you can find it on Kaggle here. This is a notebook that I'm working on for the as part of studying this book. I plan to do some more experiments on this architecture - or example, making my architecture match the books - and write it all up in a final notebook. The notebook also contains code for the GAN section and will have CGAN code once I get through that section. (It is a bit slow going as I depend on free GPU time on Kaggle.)

-Kirk

Images with 100 latent dimensions (after 25 epochs)

image

Images with 128 latent dimensions (after 25 epochs)

image

Images with 192 latent dimensions (after 25 epochs)

image

Images with 256 latent dimensions (after 25 epochs)

image

tctr commented 7 months ago

Thanks @KirkDCO I will try it later on... on my side I changed the size to the latent vector to 192 and the size of the batch to 128. I could get similar results after 800 epochs with a T4 on Google Colab...