Closed sunpeng981712364 closed 3 years ago
Because this is focal-loss, which adds a coefficient to the standard bce loss. The added coefficient is based on the sigmoid prob of the input, so we still need to compute this. Please refer to this paper for the details of the focal loss.
version 1: use torch.autograd
class FocalLossV1(nn.Module):
CLASStorch.nn.BCEWithLogitsLoss(weight: Optional[torch.Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean', posweight: Optional[torch.Tensor] = None)[SOURCE] This loss combines a Sigmoid layer and the BCELoss in one single class._ This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.
BCEWithLogitsLoss has combined a Sigmoid layer and the BCELoss in one single class, But why to use torch.sigmoid again, Anything Wrong ? thanks