Closed ahmed-fau closed 6 years ago
As I recall, though the implementation was a while ago and I don't clearly remember what my intentions were back then, the sigmoid layer was used in order to bound the output values between 0 and 1.
You can see that we're using L2 loss for D. It is easier to limit the value of loss by limiting the output values, which also helps the training to be more stable since it suppresses gradient explosion.
As for the implementation, I did consulted the original TF code a bit, but my implementation of SEGAN here is not a strict migration from original tensorflow code to pytorch, but an implementation of the paper, with my own touch to fill the missing information.
Hope this helped :)
Thanks for the nice implementation. However, I have the same question. Regardless of the original TF code, the paper said LSGAN is adopted, which means the last sigmoid layer no longer exists according to LSGAN paper. I think the removal of sigmoid is the main idea of LSGAN to effectively circumvent the saturation problem. Here is a nice post talking about this. Thanks.
Agree, here is also a good example points out the same intuition of excluding Sigmoid from the discriminator output layer of LSGAN.
@cyrilhsu @ahmed-fau thanks for pointing this out :) I would like to try out a few more trainings without the sigmoid layer. I'll fix the code depending how it turns out.
Removed sigmoid layer at the end of Discriminator. The results were good enough. Closing this issue.
Hi,
Thanks for this clear implementation ... I would like to know why did you use Sigmoid function as an output layer for the discriminator network ?
Because I cannot find it used in the TF implementation.
Best Regards