AAnoosheh / ToDayGAN

http://arxiv.org/abs/1809.09767
BSD 2-Clause "Simplified" License
172 stars 32 forks source link

Discriminator output/label weigths, generator losses and discriminator losses. #19

Closed ibanferro closed 4 years ago

ibanferro commented 4 years ago

Hello, Im just a noob at neural networks but i have some questions about the code.

The paper says that to calculate the losses, the discriminator outputs the result given by each downsampling layer and multiplies it by a factor that increases the more deep that layer is. So the first layer should have weigth 1, the next one weigth 2, the next one weigth 3 etcetera. But in the source code i see that the multipliers are not totally linear because for the RGB and Gray discriminators they are (1, 2, 3, 5) and for the Gradient discriminator they are (1, 2 and 4). Why are the penultimate layers skipped and why are the weigths of the last layers NOT 4 and 3 respectively??

And second, why are the loss_G and loss_cycle calculations of one generator used to calculate the gradients for the TWO generators?? Shouldnt the loss_G and loss_cycle of each generator be used ONLY for that respective generator??

ANd finally, why are the losses of the RGB and Gray discriminators also applied to the Gradient discriminator and viceversa instead of the losses of each discriminators applied to them and only to them??

Maybe i am interpreting the code and the paper wrongly.

Sorry for my english and thank you very much!

AAnoosheh commented 4 years ago

Hi, thanks for the good questions.

Regarding the first one, I think I last-minute added this thing to the line where I made the final layer a tiny bit more important: https://github.com/AAnoosheh/ToDayGAN/blob/06a6ff33913621360f6123d6aa66f487936d3650/models/networks.py#L91 (Even I don't remember actually doing that, but nothing is skipped, fortunately)

Regarding the loss: Even when multiple losses are added up together, will backpropagate gradients to the models from which it came from. It's common to add up all losses at the end and call backward() on the entire thing. It will follow the graph to the correct places.

I don't know what you mean by the final thing, but I assume it's similar to the previous concern. Losses being added together do not indicate they are applied to the other model. If they come from independent models, adding them and calling backward() is equivalent to just calling backward() on both separately.

Best, Asha

ibanferro commented 4 years ago

Thanks for the answers!

And finally I forgot one last question. The papers say that the generators are initialized with learning rate = 2e-4 and the discriminators with 1e-4, but in the code it seems like the same learning rate is used for the generators and discriminators. Am I right??

AAnoosheh commented 4 years ago

That is a tricky leftover detail from CycleGAN.

The LR of the discriminator is technically always 0.5 * generator's LR. This is done dynamically (and stealthily to a reader) here: https://github.com/AAnoosheh/ToDayGAN/blob/06a6ff33913621360f6123d6aa66f487936d3650/models/combogan_model.py#L97

Dividing a loss by 2 is equivalent to just setting half the LR in the optimizer. I only left it there because this is how CycleGAN did it, but it's not very obvious.

ibanferro commented 4 years ago

Thank you for the explanation Asha!

ibanferro commented 4 years ago

Sorry I don't understand why the LossGAN + lambda*Loss_cyclic value is the same for the two generators.

AAnoosheh commented 4 years ago

I mean, would you prefer they have different losses and be unbalanced for no reason? Both generators are equally important.