Open SoumyaK1 opened 4 years ago
Hi, is there anything wrong in my understanding?
Hey @SoumyaK1,
Sorry I missed your message. You're right that it should be Ns. I have missed it and it is bad from my side. I'll update it soon. Thanks for bringing it to my attention.
Thanks for your response.
Hi, I am referring your code for one of my projects on prototypical networks. I have below query :
We should choose the class for which the distance between the centroid vector and the query vector is minimum. But why is the class with maximum distance is chosen (as done in below code) ?
Qx, Qy = protonet(datax, datay, Ns, Nc, Nq, np.unique(datay)) pred = torch.log_softmax(Qx, dim=-1) .... acc = torch.mean((torch.argmax(pred, 1) == Qy).float()) Could you please give me clarification on this part? Thank you for your time. Regards, Soumya
On Tue, Dec 24, 2019 at 12:15 PM Heet Sankesara notifications@github.com wrote:
Hey @SoumyaK1 https://github.com/SoumyaK1,
Sorry I missed your message. You're right that it should be Ns. I have missed it and it is bad from my side. I'll update it soon. Thanks for bringing it to my attention.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hsankesara/Prototypical-Networks/issues/2?email_source=notifications&email_token=ALVF6NDNRE4PSINA4PKGZE3Q2GVYZA5CNFSM4J3VC2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHST4UA#issuecomment-568671824, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALVF6NFSE2AZ2BRU4JJKVXTQ2GVYZANCNFSM4J3VC2NQ .
Hey,
As you can see in this step:
Qx, Qy= protonet(datax, datay, Ns, Nc, Nq, np.unique(datay))
pred = torch.log_softmax(Qx, dim=-1)
loss = F.nll_loss(pred, Qy)
That we are using negative log loss of softmax for loss calculation. Hence the closest centroid would have the maximum value. You can check this out where I have explained NLL with softmax. Hope this helps.
Thank you very much for the explanation!
If I understand correctly the Qx matrix is the distance matrix and it should show lowest value for closest centroid , and pred matrix (which shows values after applying softmax over Qx) shoul show maximum value for closest centroid. Am I correct?
I printed both Qx matrix and pred matrix. Both shows maximum value for closest matrix. I am confused. It will be really helpful if you explain this some more.
Regards, Soumya
On Sat, Dec 28, 2019 at 10:19 AM Heet Sankesara notifications@github.com wrote:
Hey,
As you can see in this step:
Qx, Qy= protonet(datax, datay, Ns, Nc, Nq, np.unique(datay)) pred = torch.log_softmax(Qx, dim=-1) loss = F.nll_loss(pred, Qy)
That we are using negative log loss of softmax for loss calculation. Hence the closest centroid would have the maximum value. You can check this https://blog.floydhub.com/n-shot-learning/ out where I have explained NLL with softmax. Hope this helps.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hsankesara/Prototypical-Networks/issues/2?email_source=notifications&email_token=ALVF6ND2YJEJJ3PJYONJB6LQ23LEBA5CNFSM4J3VC2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYCCGI#issuecomment-569385241, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALVF6NHMPNRONWNBLFKNMJTQ23LEBANCNFSM4J3VC2NQ .
Hey,
You are right that Qx matrix would be a distance matrix. When we take softmax of it, it'll convert them so that their sum would be one. Now the one with the lowest distance would also be lowest in the softmax. Hence our answer would be the lowest value after the softmax. If we take the negative log of all the outputs of the softmax, the one with the lowest value would give the highest value. Hence I used argmax.
For example: Suppose the output of softmax is [0.2,0.3,0.5] Here 0.2 will be our answer Now if I take the negative log of them i.e. -log(xi), I'll get [1.60943791, 1.2039728 , 0.69314718], Hence I need argmax so that I get the correct answer i.e 0th element of the array.
Hope this helps and please contact me if you have any doubts.
I understand and appreciate the detailed explanation given by you. Thank you very much for your time! Still some doubts. Could you please reply to the email I sent to your gmail?
Hey,
I have to go out of town for some reasons. I'll reply you either on Monday or Tuesday.
Regards, Heet
On Sun, 29 Dec, 2019, 8:50 AM SoumyaK1, notifications@github.com wrote:
I understand and appreciate the detailed explanation given by you. Thank you very much for your time! Still some doubts. Could you please reply to the email I sent to your gmail?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Hsankesara/Prototypical-Networks/issues/2?email_source=notifications&email_token=AFAJKM653KWTPH67A3PKQV3Q3AJPFA5CNFSM4J3VC2N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHYWZ3Y#issuecomment-569470191, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAJKM2GPEQECR2AFEWL23LQ3AJPFANCNFSM4J3VC2NQ .
okay, thank you
@Hsankesara .. I have the same concern as @SoumyaK1
The code does argmax over pred
not the argmax over the negative log.
So you would get 0.5 as the answer instead of 0.2
Hey @abgoswam,
Here's the reply I gave to @SoumyaK1
I understand the issue and you're correct that it isn't implemented in a traditional way. I have also forgotten what I did there and it took me an hour or so to figure it out. Now the issue I was facing while writing the code is that I need to make sure that one with the lowest softmax score would be the correct answer as it is the nearest centroid. But the negative log loss followed by softmax tries to maximize the probability of the correct class so that the input of the log can be 1 and therefore loss can be minimized. As you can see, softmax is actually not representing the probabilities but is just normalized distances. So the conflict is that the negative log loss would try to maximize the softmax score and hence making the one with the maximum distance from the point the correct one.
I have tried multiple ideas to tackle this issue. The most optimum way would be using customized log-loss in the torch. But I avoided this because I have tried another way which would be theoretically the same as doing this and would give exact same results. To make it clear, think of Qx as an inverse of a distance from the centroid. We can do this because we have used a deep learning model which only care about the output and not about its theoretical implication. We drew analogy so that we can understand it more clearly but for a deep learning model, the output would be shaped as we want it to be(by backpropagation). Now if you assess it in that way, negative loss trying to maximize the weight of the correct class implies that it is trying to maximize the inverse distance between(which also means minimizing distance) from the point. The only catch is that now you need to use argmax rather than argmin. I know it's not exactly what the paper says but it is the simplest way I can think of implementing it.
Hope this helps
@Hsankesara . thanks for the response.
Regarding this line in your explanation :
To make it clear, think of Qx as an inverse of a distance from the centroid.
However, the code simply computes the pairwise distance. I do not see any place in the code where you take the inverse of the distance.
Am I missing something?
Proposed fix: (Based on your explanation)
Before computing loss / accuracy do : Qx = Qx.max() - Qx
@Hsankesara I played around with your ipython notebook example
Here are two versions:
Qx = Qx.max() - Qx
in the train_step
before computing the loss / accuracyLet me know your thoughts.
Hey @abgoswam ,
You can try this also and let me know the results. As I mentioned I have calculated the pairwise distance but directed the deep learning model to learn the inverse rather than the direct distance. Hence max of the inverse must be a min of the direct distance and hence argmax would work. I tried a few different approaches and it worked the best. I hope this helps and let me know the outcome of your experiment.
You can edit this notebook: https://www.kaggle.com/hsankesara/prototypical-net so that the other variable remains the same.
Hi, I am raising a query, not an issue. Why is the sum divided by Nc (No. of classes per episode) instead of Ns (No.of samples per class) while calculating centroid vector?