Question about Kullback-Leibler (KL) divergence

WenbinLee / ADM

The Pytorch code of "Asymmetric Distribution Measure for Few-shot Learning", IJCAI 2020.

Other

15 stars 2 forks source link

Question about Kullback-Leibler (KL) divergence #1

Closed LoveMiki closed 3 years ago

LoveMiki commented 3 years ago

Thanks for your sharing. It is an excellent work. However, when I was reading your paper, I found that you said DKL(Q||S) mainly matches the distribution of Q to the one of S. To the best of my knowledge, Q should be the true probability distribution. But in few shot learning, support set is actually a real (known) probability distribution. Isn't it supposed to be DKL(S||Q) rather than DKL(Q||S)? Since we want query set to be close to support set.

Looking forward to your responding. Thank you.

WenbinLee commented 3 years ago

Yes, your understanding is correct. Our main purpose is to make the distribution of Q match the distribution of S. We are sorry for making you confused on the symbol definition of "DKL(Q||S)", but its real mathematical meaning is as the claim in our paper, i.e., Eq.(2). Hope I have answered your question.

Thanks.　

LoveMiki commented 3 years ago

Yes, your understanding is correct. Our main purpose is to make the distribution of Q match the distribution of S. We are sorry for making you confused on the symbol definition of "DKL(Q||S)", but its real mathematical meaning is as the claim in our paper, i.e., Eq.(2). Hope I have answered your question.

Thanks.

Thanks for your prompt reply. I understand the claim made in your paper, which is totally correct. But DKL(Q||S) indicates that we want S to as correctly as possible to approximate Q such that the cross-entropy loss can be minimized. The same definition can also be found in wikipedia. So, if we want the query set to approximate the distribution of support set, we should write DKL(S||Q), is that right? Thank you.

WenbinLee commented 3 years ago

Yes, your understanding is correct. Our main purpose is to make the distribution of Q match the distribution of S. We are sorry for making you confused on the symbol definition of "DKL(Q||S)", but its real mathematical meaning is as the claim in our paper, i.e., Eq.(2). Hope I have answered your question. Thanks.

Thanks for your prompt reply. I understand the claim made in your paper, which is totally correct. But DKL(Q||S) indicates that we want S to as correctly as possible to approximate Q such that the cross-entropy loss can be minimized. The same definition can also be found in wikipedia. So, if we want the query set to approximate the distribution of support set, we should write DKL(S||Q), is that right? Thank you.

Yes, this is a symbol definition mistake in our paper. Indeed, we should write it as DKL(S||Q), following the wikipedia.