Closed caixincx closed 1 year ago
我也感觉很奇怪 第一种形式不仅形式上不对,而且感觉完全没有梯度
我也感觉很奇怪 第一种形式不仅形式上不对,而且感觉完全没有梯度 是的,我做实验发现这一项的loss很小,几乎没啥影响
Check the following contents before using the KL divergence in pytorch:
https://zhuanlan.zhihu.com/p/575809052?utm_id=0
https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html#torch.nn.KLDivLoss
Or try with ur own formats if you want : )
我也感觉很奇怪 第一种形式不仅形式上不对,而且感觉完全没有梯度 是的,我做实验发现这一项的loss很小,几乎没啥影响
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi FutabaSakuraXD. Thank you for your reply. For the standard form of torch.nn.KLDivLoss or _F.kldiv, I think that we have to first perform a log operation on the inputs.
Moreover, the grad of the inputs should not be detached.
Gradients of the observation are detached to prevent degeneration.
作者您好,您在VSD中计算了p(y|v)和p(y|z)的散度,计算的代码如下:
但是根据我的理解,p(y|v)和p(y|z)的KL散度应该是如下计算:
请问为什么您在代码中是使用第一种实现方式?这两种实现方式是等价的吗?