Discriminator Sigmoid layer

dansuh17 / segan-pytorch

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

GNU General Public License v3.0

106 stars 32 forks source link

Discriminator Sigmoid layer #6

Closed ahmed-fau closed 6 years ago

ahmed-fau commented 6 years ago

Hi,

Thanks for this clear implementation ... I would like to know why did you use Sigmoid function as an output layer for the discriminator network ?

Because I cannot find it used in the TF implementation.

Best Regards

dansuh17 commented 6 years ago

As I recall, though the implementation was a while ago and I don't clearly remember what my intentions were back then, the sigmoid layer was used in order to bound the output values between 0 and 1.

You can see that we're using L2 loss for D. It is easier to limit the value of loss by limiting the output values, which also helps the training to be more stable since it suppresses gradient explosion.

As for the implementation, I did consulted the original TF code a bit, but my implementation of SEGAN here is not a strict migration from original tensorflow code to pytorch, but an implementation of the paper, with my own touch to fill the missing information.

Hope this helped :)

cyrilhsu commented 6 years ago

Thanks for the nice implementation. However, I have the same question. Regardless of the original TF code, the paper said LSGAN is adopted, which means the last sigmoid layer no longer exists according to LSGAN paper. I think the removal of sigmoid is the main idea of LSGAN to effectively circumvent the saturation problem. Here is a nice post talking about this. Thanks.

ahmed-fau commented 6 years ago

Agree, here is also a good example points out the same intuition of excluding Sigmoid from the discriminator output layer of LSGAN.

dansuh17 commented 6 years ago

@cyrilhsu @ahmed-fau thanks for pointing this out :) I would like to try out a few more trainings without the sigmoid layer. I'll fix the code depending how it turns out.

dansuh17 commented 6 years ago

Removed sigmoid layer at the end of Discriminator. The results were good enough. Closing this issue.