FluxML / model-zoo

Please do not feed the models
https://fluxml.ai/
Other
913 stars 335 forks source link

Is the dcgan_mnist.jl correct? #285

Closed davibarreira closed 3 years ago

davibarreira commented 3 years ago

I've been trying to adapt the dcgan_mnist.jl example to create my own GAN, but as I was studying the code, it dawned on me that it is fundamentally different from other implementations of GAN's using Keras. Implementations in Keras usually create 3 networks: Discriminator, Generator and GAN.

The GAN network is just chaining the Discriminator with the Generator. The difference in implementation when compared to dcgan_mnist.jl comes from the fact that in Keras, one first trains the Discriminator, and then trains the GAN (not the Generator), but does not update the Generator... In dcgan_mnist.jl, one trains the Discriminator and then the Generator. Now, is this really the same thing? I've been trying to replicate a GAN that I've implemented on Keras using Flux.jl, without success. And I'm starting to think that this difference might be the reason.

darsnack commented 3 years ago

Unfortunately, I don't have a direct answer for you, since I haven't looked deeply into the implementation in the zoo. But I have trained GANs in PyTorch in the past using the same approach as the zoo implementation (i.e. train discriminator, train generator). I don't know if this is the source of the difference, but I can say that the implementation in the zoo is not objectively wrong, and that it is the implementation that I have most commonly encountered. In the Keras code, when do you update the generator?

davibarreira commented 3 years ago

The generator is updated by passing the error through the discriminator, but freezing the parameters of the discriminator. In Pytorch (I dont actually know this framework), I read some code, and there is usually a line saying Generator.zero_grad() before training the generator. I don't see anything similar in this example of GAN using Flux.jl.

davibarreira commented 3 years ago

My problem is, I've been trying to replicate very simple 1D cases, but the Generator does not seem to work with Flux. My only guess of what might be wrong is this step that I'm pointing to. Other than that, everything is equal to the examples I'm trying to replicate. Here is one of the examples I'm trying to replicate: 1D GAN

DhairyaLGandhi commented 3 years ago

You shouldn't need ZeroGrad, we don't store grads the same way as pytorch, and you don't need it. Is it that the parameters aren't updating at all? Could you check that with your implementation? It would be helpful to have some code to be able to say more clearly

davibarreira commented 3 years ago

Sure, I'm redoing all my code, and once I'm done, I'll post it. Another thing that I found strange in the DCGAN.jl example is that the discriminator is not rescaled to return a number between zero and 1, which is usually done by using a sigmoid as activation. Is there an specific reason why this does not have a sigmoid?

darsnack commented 3 years ago

It's not needed with logitcrossentropy loss which does the softmax for you (and it is more numerically stable than doing softmax manually before crossentropy).

davibarreira commented 3 years ago

Ok, after I remade all the code and considered the information you guys gave me, the model finally worked! The main issue was that I was using a sigmoid as activation for the discriminator, and using the logitcrossentropy. Once I adjusted that, it worked! Thanks a lot for all the comments :D