Closed shaohua0116 closed 7 years ago
Thx for your comment :) I used Sigmoid activation function because the output of the discriminator should be compared to zero labels and ones labels. If I hadn't use sigmoid function, the output of the discriminator can be any number. Because LSGAN uses squared loss as metric, the output of discriminator should be between 0 and 1.
Ex) Discriminator output without sigmoid using real data : 3.9 -> Loss = (3.9-1)^2 =8.41 Discriminator output with sigmoid using real data : sigmoid(3.9) = 0.9802 -> Loss = (0.9802-1)^2 =0.00039204
Thanks for the reply. I am aware of the concept of the sigmoid function since I work on generative models as well. : ) However, according to the original paper Least Squares Generative Adversarial Networks, I believe they get rid of the sigmoid activation function at the last layer for a better converging property. Please refer to the section 3.2.1 in the paper. A similar opinion can also be found in this blog post. I agree that with or without the sigmoid activation at the last layer, the model should work just fine in most of the cases while choosing the last layer as a linear one would yield better performance.
oh I see what you mean! Using only Linear layer would also make the outputs around 0~1 in order to minimize loss. Maybe we should try which one makes better results or converge faster with or without sigmoid function in the last layer. Thanks for enlightening me :)
I also got confused about this issue. Thanks for the discussion & your nice implementation!
Nice implementation! I am just wondering your choice of the last layer activation function of the discriminator. Since it looks a bit weird to me to minimizes regression loss on a sigmoid output. Setting the last layer activation as None makes more sense to me.