[BUG] Get nan when calculating cross entropy loss.

OpenBMB / ModelCenter

Efficient, Low-Resource, Distributed transformer implementation based on BMTrain

https://modelcenter.readthedocs.io

Apache License 2.0

243 stars 30 forks source link

Closed alphaGem closed 2 years ago

alphaGem commented 2 years ago

问题描述

第 180 与第 183 行的 -inf 导致计算 Softmax 后的 Cross Entropy Loss 时出现 nan。

复现步骤

期望行为

正常行为下，loss 应当是一个非负实数。

附加信息

交叉熵的计算用到 $x \log(y)$，又因为 -inf 在 Softmax 后得到 0，所以 $x=y=0$ 时计算出 $0\times -\inf = \mathrm{nan}$。

THUCSTHanxu13 commented 2 years ago

麻烦给一下bug发生的具体局部代码吧

alphaGem commented 2 years ago

局部代码

prob_t = F.softmax(logits_t, dim=-1)
log_prob_s = F.log_softmax(logits_s, dim=-1)
d_loss = -(prob_t * log_prob_s).sum(dim=1).mean()

其中 logits_t 和 logits_s 是模型的输出。计算出 d_loss = nan。

把代码改成

prob_t = F.softmax(logits_t, dim=-1)
log_prob_s = F.log_softmax(logits_s, dim=-1)
d_loss += -(prob_t * log_prob_s).sum(dim=1)[:,:-1].mean()

后行为正常。

Achazwl commented 2 years ago

已修复。

第 180 与第 183 行的 -inf 导致计算 Softmax 后的 Cross Entropy Loss 时出现 nan。

之前由于历史原因不能处理奇数长度的词表，给词表大小加了1，所以使用方式确实为你写的这样：[:,:-1]。现已恢复为正常的词表大小。