GANs-in-Action / gans-in-action

Companion repository to GANs in Action: Deep learning with Generative Adversarial Networks
1.01k stars 420 forks source link

Chapter 4 output #16

Closed donlaiq closed 3 years ago

donlaiq commented 3 years ago

I'm not sure if there's something misconfigured in my computer or what, but in every single run I'm getting this kind of outputs (very different from the proposed ones) and the resulting images are still noise.

1000 [D loss: 0.000010, acc.: 100.00%] [G loss: 0.025414] 2000 [D loss: 0.000516, acc.: 100.00%] [G loss: 0.012415] 3000 [D loss: 0.000217, acc.: 100.00%] [G loss: 0.012707] 4000 [D loss: 0.000276, acc.: 100.00%] [G loss: 0.015317] 5000 [D loss: 0.000011, acc.: 100.00%] [G loss: 0.004400] 6000 [D loss: 0.000001, acc.: 100.00%] [G loss: 0.000662] 7000 [D loss: 0.000000, acc.: 100.00%] [G loss: 0.000308] ...

Is there something to fix in the source code?

firmamentone commented 3 years ago

I'm not sure if there's something misconfigured in my computer or what, but in every single run I'm getting this kind of outputs (very different from the proposed ones) and the resulting images are still noise.

1000 [D loss: 0.000010, acc.: 100.00%] [G loss: 0.025414] 2000 [D loss: 0.000516, acc.: 100.00%] [G loss: 0.012415] 3000 [D loss: 0.000217, acc.: 100.00%] [G loss: 0.012707] 4000 [D loss: 0.000276, acc.: 100.00%] [G loss: 0.015317] 5000 [D loss: 0.000011, acc.: 100.00%] [G loss: 0.004400] 6000 [D loss: 0.000001, acc.: 100.00%] [G loss: 0.000662] 7000 [D loss: 0.000000, acc.: 100.00%] [G loss: 0.000308] ...

Is there something to fix in the source code?

I've also encountered this problem. I am still trying to figure out the root causes.

However, I have assigned the learning rate for the model compiling like that. (1e-5)

discriminator.compile(loss='binary_crossentropy',optimizer=Adam(1e-5),metrics=['accuracy'])'

gan.compile(loss='binary_crossentropy', optimizer=Adam(1e-5))

it seems that the results have improved.

donlaiq commented 3 years ago

I'm not sure if there's something misconfigured in my computer or what, but in every single run I'm getting this kind of outputs (very different from the proposed ones) and the resulting images are still noise. 1000 [D loss: 0.000010, acc.: 100.00%] [G loss: 0.025414] 2000 [D loss: 0.000516, acc.: 100.00%] [G loss: 0.012415] 3000 [D loss: 0.000217, acc.: 100.00%] [G loss: 0.012707] 4000 [D loss: 0.000276, acc.: 100.00%] [G loss: 0.015317] 5000 [D loss: 0.000011, acc.: 100.00%] [G loss: 0.004400] 6000 [D loss: 0.000001, acc.: 100.00%] [G loss: 0.000662] 7000 [D loss: 0.000000, acc.: 100.00%] [G loss: 0.000308] ... Is there something to fix in the source code?

I've also encountered this problem. I am still trying to figure out the root causes.

However, I have assigned the learning rate for the model compiling like that. (1e-5)

discriminator.compile(loss='binary_crossentropy',optimizer=Adam(1e-5),metrics=['accuracy'])'

gan.compile(loss='binary_crossentropy', optimizer=Adam(1e-5))

it seems that the results have improved.

Thank you very much firmamentone! Yes, indeed, that did the trick! After your suggestion, the network is learning properly!

donlaiq commented 3 years ago

I'm reopening this thread just to make an update. The hack proposed by firmamentone is a good one, but I think I found a better way to solve it. When I created this thread, I was using TensorFlow 1.14.0. I've updated TensorFlow, as suggested in https://www.tensorflow.org/hub/installation, with the command pip install "tensorflow>=1.15,<2.0", to keep working with the examples of Chapter 6, and now I'm getting better results and I don't even need to add parameters to the Adam optimizer. Now, the outputs look very similar to the proposed ones:

1000 [D loss: 0.030979, acc.: 99.61%] [G loss: 4.293407] 2000 [D loss: 0.056869, acc.: 99.61%] [G loss: 3.528089] 3000 [D loss: 0.049382, acc.: 100.00%] [G loss: 4.535636] 4000 [D loss: 0.132472, acc.: 95.70%] [G loss: 3.754888] 5000 [D loss: 0.063151, acc.: 98.83%] [G loss: 3.865706] 6000 [D loss: 0.069952, acc.: 99.22%] [G loss: 3.805980] 7000 [D loss: 0.064669, acc.: 100.00%] [G loss: 4.030197] ...

firmamentone commented 3 years ago

I'm reopening this thread just to make an update. The hack proposed by firmamentone is a good one, but I think I found a better way to solve it. When I created this thread, I was using TensorFlow 1.14.0. I've updated TensorFlow, as suggested in https://www.tensorflow.org/hub/installation, with the command pip install "tensorflow>=1.15,<2.0", to keep working with the examples of Chapter 6, and now I'm getting better results and I don't even need to add parameters to the Adam optimizer. Now, the outputs look very similar to the proposed ones:

Hi, It seems great. However, after I've added the command "pip install "tensorflow>=1.15,<2.0", some error occurred in package s import "from keras.datasets import mnist" and I got an error message "ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow"

Is it necessary install keras in some specified version as well?

That's my colab notebook. https://colab.research.google.com/drive/1oaePpX10y4bbXkkTPjZyhgr41PwlhrGs?usp=sharing

thanks

donlaiq commented 3 years ago

I'm reopening this thread just to make an update. The hack proposed by firmamentone is a good one, but I think I found a better way to solve it. When I created this thread, I was using TensorFlow 1.14.0. I've updated TensorFlow, as suggested in https://www.tensorflow.org/hub/installation, with the command pip install "tensorflow>=1.15,<2.0", to keep working with the examples of Chapter 6, and now I'm getting better results and I don't even need to add parameters to the Adam optimizer. Now, the outputs look very similar to the proposed ones:

Hi, It seems great. However, after I've added the command "pip install "tensorflow>=1.15,<2.0", some error occurred in package s import "from keras.datasets import mnist" and I got an error message "ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow"

Is it necessary install keras in some specified version as well?

That's my colab notebook. https://colab.research.google.com/drive/1oaePpX10y4bbXkkTPjZyhgr41PwlhrGs?usp=sharing

thanks

That's a good question, because I'm coming from Java, and I am not really sure what I did to configure my workspace in the first place. Anyways, I've tried to create a new environment using some of the "best practices" I've found. I can't help you with the colab notebook, because I never use it, but I'm pretty sure you will be able to figure it out by yourself.

1) You need to install Anaconda. Once it is done, you should be able to use the 'conda' command from your cli. 2) To install a fresh virtual environment, you should run 'conda create -n tf-gpu python=3.6 ipykernel', where tf-gpu is the name of the environment, and you can change it. 3) Once the installation is successful you have to activate the environment with 'conda activate tf-gpu' (if you chose tf-gpu to name your environment). That way, you will see '(tf-gpu)' on the left side of your prompt. 4) I'm using jupyter notebook (I guess there should be a way to do the same with colab notebook), so with the environment activated, you need to install jupyterlab with 'pip install jupyterlab'. 5) Install the following dependencies: pip install matplotlib (if you want to see the images generated by the GAN) pip install keras==2.2.4 pip install "tensorflow>=1.15,<2.0" 6) Then, you need to ensure you are set up access your virtual environment through jupyter notebooks with the following command: python -m ipykernel install --user --name tf-gpu 7) Run 'jupyter notebook'. It will open a tab in your browser. Then you should be able to click on New and select tf-gpu. 8) Copy/paste the code in the new tab. 9) Click on Run and... Ta-da! I'm seeing some warnings and I don't know how to fix them (and I don't care either), but it's working very well.

The only weird thing is that in the first run I can't see the images from mathplotlib, but something like '<Figure size 400x400 with 16 Axes>'. So, once I see this kind of strings, I stop it, I rerun it again, and then I can see the images generated by the GAN. I can't run it from PyCharm either (I would like it, because it has a nicer interface), because I run out of memory. But it seems to be a known problem in the python community. I've tried to run the code following the installation of https://docs.anaconda.com/anaconda/user-guide/tasks/tensorflow/, which installs TensorFlow 2.x, but I'm getting noise, just the same that I've got when I started this thread in the first place.

I hope it helps!

PS: I'm sorry if you saw my previous post (already deleted), when I messed the things up by installing TensorFlow 2.x, but still using my previous environment.

firmamentone commented 3 years ago

thanks to donlaiq,

After I installed tensorflow and keras to "1.15,<2.0" and "2.2.4", I got the result that is similar to the example in chapter4.

!pip install keras==2.2.4 !pip install "tensorflow-gpu>=1.15,<2.0"

Thanks a lot

colab notebook https://colab.research.google.com/drive/1oaePpX10y4bbXkkTPjZyhgr41PwlhrGs?usp=sharing

Nevermetyou65 commented 2 years ago

HI, I am reading this book and I am on chapter4 right now and I also encountered this problem. In my case I didn't use pure Keras but I use tf.keras which should not be different. I have been doing research for a while but still no clues. There are other facing the same problem as me when use tf.keras. But when I removed the BatchNormalization layer from generator and discriminator, the network worked just fine. I am really confused. Do you have any idea??? I know that there is a paper suggesting that we should use batch norm layer for training GAN but even the example code in Keras site did not use it.

https://keras.io/examples/generative/dcgan_overriding_train_step/

I am really not sure about the purpose of using batch norm in GAN right now or is it just because I use tf.keras?? I use tensorflow version 2.6.0 on google colab

donlaiq commented 2 years ago

Hello Nevermetyou65, The problem I was talking about when I started this thread is called 'Mode Collapse', but I found this name later. It occurs when the generator finds a small number of samples that fool the discriminator and therefore isn’t able to produce any examples other than a limited set. You should use the BatchNormalization layer when the difference between the weights of your data is too large (to avoid the 'exploding gradient' problem). Maybe the example you are mentioning doesn't need it because of the already normalized values of your data. Maybe the size of your batches could be also an issue. Small batches are quick to estimate, but their estimations are not so accurate. In my experience, and according to we were discussing in this thread, the new versions of the same tool don't guarantee the backward compatibility, meaning that what works in this current version maybe won't work in a future version without tuning the source code. I hope you can find some kind of solution from this.

Nevermetyou65 commented 2 years ago

Hello Nevermetyou65, The problem I was talking about when I started this thread is called 'Mode Collapse', but I found this name later. It occurs when the generator finds a small number of samples that fool the discriminator and therefore isn’t able to produce any examples other than a limited set. You should use the BatchNormalization layer when the difference between the weights of your data is too large (to avoid the 'exploding gradient' problem). Maybe the example you are mentioning doesn't need it because of the already normalized values of your data. Maybe the size of your batches could be also an issue. Small batches are quick to estimate, but their estimations are not so accurate. In my experience, and according to we were discussing in this thread, the new versions of the same tool don't guarantee the backward compatibility, meaning that what works in this current version maybe won't work in a future version without tuning the source code. I hope you can find some kind of solution from this.

Yes, Thanks a lot for your reply, so it's a mode collapse. I have heard about it before, but FYI the example I mentioned in my comment is exactly the same code written in chapter 4 in this book, which used batch norm layer in both generator and discriminator. I also used the same mini-batch size and double checked if I coded mistakenly somewhere. The only different is I used tf.keras.

This guy is facing the same problem as me. https://github.com/GANs-in-Action/gans-in-action/issues/14

But reading your reply, the difference in version of packages should be the problem I guess. I actually found the solution and that is to remove batch norm layer form both generator and or from discriminator, I have not tried every possible options though. I just do not understand why.