I am wrapping my head around the explanation of the vanishing gradients problem of GANs for quite some time:
The current solution pdf document plots the input of the sigma functions over the loss function values to explain the (non-)saturating behavior. However, I am asking myself if that plot just captures the saturation of the sigma function and not the saturation behavior of the G loss function itself.
The NIPS 2016 GAN tutorial shows in figure 16 (p.26) an explanation of the saturating loss without taking the sigma function into account. With this explanation, I guess, the saturation behavior is explained through the gradients for G when G is not (yet) able to generate good fakes and D can easily identify them as fake (x = 0 or close to 0).
See a plot of the saturating and non-saturating loss function and their derivations+and+(-ln(x))+and+derivation(ln(1-x))+and+derivation(-ln(x))for+x+from+0+to+1). There, the saturating loss has a small gradient of around -1 and the saturating loss of -infinity+for+0) at x = 0.
When I plot the gradients over the training for both loss functions I also get higher gradient means and higher standard deviations for the non-saturating loss when compared to the saturating loss (see notebook).
Maybe I am missing something?
I would be happy if somebody could point me in the right direction.
I am wrapping my head around the explanation of the vanishing gradients problem of GANs for quite some time:
The current solution pdf document plots the input of the sigma functions over the loss function values to explain the (non-)saturating behavior. However, I am asking myself if that plot just captures the saturation of the sigma function and not the saturation behavior of the G loss function itself.
The NIPS 2016 GAN tutorial shows in figure 16 (p.26) an explanation of the saturating loss without taking the sigma function into account. With this explanation, I guess, the saturation behavior is explained through the gradients for G when G is not (yet) able to generate good fakes and D can easily identify them as fake (x = 0 or close to 0). See a plot of the saturating and non-saturating loss function and their derivations+and+(-ln(x))+and+derivation(ln(1-x))+and+derivation(-ln(x))for+x+from+0+to+1). There, the saturating loss has a small gradient of around -1 and the saturating loss of -infinity+for+0) at x = 0. When I plot the gradients over the training for both loss functions I also get higher gradient means and higher standard deviations for the non-saturating loss when compared to the saturating loss (see notebook).
Maybe I am missing something?
I would be happy if somebody could point me in the right direction.