Burton2000 / CS231n-2017

Completed the CS231n 2017 spring assignments from Stanford university
591 stars 193 forks source link

Assignment 3 GAN: Logits/Scores #4

Closed yoniker closed 6 years ago

yoniker commented 6 years ago

Hello!

First of all, nice work.

Secondly, this might be a bug in the original assignment but: AFAIK logits are the scores after applying softmax. For the Vanille GAN (the first one you implemented),the lines logits_fake = D(fake_images) and logits_real = D(2* (real_data - 0.5)).type(dtype) in the function run_a_gan (the one which trains GAN) bother me since those are scores and not logits... It does work though (generated images are not that bad). So what do you think? Is it a bug in the original assignment, or am I missing something here? :)

Burton2000 commented 6 years ago

Hi thanks that's great to hear. To answer your question:

Logits are actually the raw unscaled outputs they are not the result after applying softmax. Softmax function takes as input the logits and produces a normalized output so that sum(softmax_output) = 1. We can also interpret these softmax outputs as probabilities.

Our discriminator D ends with just a linear layer (no softmax) so the output of D is the logits.

This link helps to explain it better than me hopefully. https://stackoverflow.com/a/34243720/7431458

Let me know if this answers yours question.

yoniker commented 6 years ago

Oh I see, thanks! So in that case...what's the difference between scores and logits, if there's any? Are logits simply the term for scores when we do use something like Softmax and cross entropy?

Burton2000 commented 6 years ago

Yeah they are generally the same thing and people might interchange the two, but perhaps using logits is more explicit what you are referring to. Someone could refer to the output of softmax layer as probability 'scores' but it would be very wrong to call the output of a softmax layer logits.

So yes you are right that we are more likely to use the term logits instead of scores if we use a Softmax or Sigmoid layer at the end to avoid confusion.

yoniker commented 6 years ago

Cool,here's a question which might be more interesting,if you don't mind: You have used padding=1 when it comes to the transpose convolution in DCGAN(generator). The assignment states "'same padding'". Now, to my understanding, 'same' padding means that the HxW of the output is the same as HxW of the input (of either conv or Transpose conv). So I've tried a padding=5 initially, only to realize that since kernel size is 4,no padding will be "same".

You clearly understand something that I don't so if you can clarify it will be awesome!

Oh also...how much work (in terms of time) was it to do the same in TensorFlow as opposed to Pytorch?

Burton2000 commented 6 years ago

For convolution, 'same' padding will only return the same HxW if the stride=1. Same can have slightly different meanings I think but in Tensorflow 'same' just means that you pad the image (if required) so that your output size is ceil(input/stride). I presume this is the 'same' they refer to in these assignments.

https://www.tensorflow.org/api_guides/python/nn#Convolution

To calculate padding value I worked backwards from the desired output shape [28,28] and know that I need to get to [7,7] by 2 convolutions of stride=2,kernel_size=4. The only way to do this is with pad=1.

Therefore going the other way with transposed convolution the pad must also be 1 (Remember tranpose conv will give the input shape of conv operation if stride,pad and kernel size are same and vice versa https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d)

You can also use the formula here to check this is right for transpose conv: http://pytorch.org/docs/master/nn.html?highlight=transpose#torch.nn.ConvTranspose2d

For work not much as I kinda knew how to use TF already so in most cases was just a matter of changing the Pytorch operations for the TF equivalents.