关于VSD中KL散度计算的一点疑问

caixincx commented 2 years ago

作者您好，您在VSD中计算了p(y|v)和p(y|z)的散度，计算的代码如下：

 vsd_loss = kl_div(input=self.softmax(i_observation[0].detach() / self.args.temperature),
                          target=self.softmax(i_representation[0] / self.args.temperature))

但是根据我的理解，p(y|v)和p(y|z)的KL散度应该是如下计算：

vsd_loss = kl_div(input=torch.nn.LogSoftmax(dim=1)(i_representation[0] / self.args.temperature),
                          target=self.softmax(i_observation[0].detach() / self.args.temperature))

请问为什么您在代码中是使用第一种实现方式？这两种实现方式是等价的吗？

Qinying-Liu commented 1 year ago

我也感觉很奇怪第一种形式不仅形式上不对，而且感觉完全没有梯度

caixincx commented 1 year ago

我也感觉很奇怪第一种形式不仅形式上不对，而且感觉完全没有梯度是的，我做实验发现这一项的loss很小，几乎没啥影响

FutabaSakuraXD commented 1 year ago

Check the following contents before using the KL divergence in pytorch:

https://zhuanlan.zhihu.com/p/575809052?utm_id=0

https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html#torch.nn.KLDivLoss

Or try with ur own formats if you want : )

--------------原始邮件-------------- 发件人："caixincx @.>; 发送时间：2022年11月23日(星期三) 中午11:57 收件人："FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification" @.>; 抄送："Subscribed @.***>; 主题：Re: [FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification] 关于VSD中KL散度计算的一点疑问 (Issue #5)

我也感觉很奇怪第一种形式不仅形式上不对，而且感觉完全没有梯度是的，我做实验发现这一项的loss很小，几乎没啥影响

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Qinying-Liu commented 1 year ago

Hi FutabaSakuraXD. Thank you for your reply. For the standard form of torch.nn.KLDivLoss or _F.kldiv, I think that we have to first perform a log operation on the inputs.
Moreover, the grad of the inputs should not be detached.

FutabaSakuraXD commented 1 year ago

Gradients of the observation are detached to prevent degeneration.

FutabaSakuraXD / Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification

关于VSD中KL散度计算的一点疑问 #5