clcarwin / focal_loss_pytorch

A PyTorch Implementation of Focal Loss.
MIT License
952 stars 226 forks source link

FocalLoss vs CrossEntropyLoss #2

Open nationalflag opened 6 years ago

nationalflag commented 6 years ago

In my experiments, the the loss of FocalLoss with gamma=0 is much lower than the loss of CrossEntropyLoss. What makes it?

clcarwin commented 6 years ago

Can you try to run python focalloss_test.py?

nationalflag commented 6 years ago

But when i replace CrossEntropyLoss with FocalLoss to train my network, the corresponding trainning loss is lower.

nationalflag commented 6 years ago

I use the same value of alpha of FocalLoss and weight of CrossEntropyLoss ([0.366, 1]), the max_error is 0.12252533435821533.

huaifeng1993 commented 5 years ago

i rewrite the FocalLoss code according to my experiments and i meet the same problem as you.

class BCFocalLoss(nn.Module):
    def __init__(self, gamma=0, alpha=None, size_average=True):
        super(BCFocalLoss, self).__init__()
        self.gamma = gamma
        self.alpha = alpha
        if isinstance(alpha,(float,int)): self.alpha = torch.Tensor([alpha,1-alpha])
        if isinstance(alpha,list): self.alpha = torch.Tensor(alpha)
        self.size_average = size_average

    def forward(self, input, target):
        if input.dim()>2:
            input = input.view(input.size(0),input.size(1),-1)  # N,C,H,W => N,C,H*W
            #input = input.transpose(1,2)    # N,C,H*W => N,H*W,C
            #input = input.contiguous().view(-1,input.size(2))   # N,H*W,C => N*H*W,C
            input = input.contiguous().view(-1)
        #target = target.view(-1,1)
        target = target.view(-1)

        #logpt = F.log_softmax(input)
        #logpt = F.sigmoid(input)
        logpt = F.logsigmoid(input)
        #logpt = logpt.gather(1,target)
        #logpt = logpt.view(-1)
        pt = Variable(logpt.data.exp())

        if self.alpha is not None:
            if self.alpha.type()!=input.data.type():
                self.alpha = self.alpha.type_as(input.data)
            at = self.alpha.gather(0,target.data.view(-1))
            logpt = logpt * Variable(at)

        loss = -1 * (1-pt)**self.gamma * logpt
        if self.size_average: return loss.mean()
        else: return loss.sum()

when gamma=0 it works as BCEWithLogitsLoss in my test code.But when i replace BCEWithLogitsLoss with BCFocalLoss to train my network, the corresponding trainning loss is much lower.Here is the test code:

    torch.random.manual_seed(32)
    p=torch.randn(1,1,56,56,requires_grad=True).type(torch.float32)
    t=torch.ones(1,1,56,56).type(torch.float32)
    #c=torch.sigmoid(p)
    #result=F.binary_cross_entropy(c,t)
    criterion=BCFocalLoss()
    #result=criterion(p,t)
    result=criterion(p,t)
    result2=nn.BCEWithLogitsLoss()(p,t)

    result.backward()
    #result2.backward()
    print(p.grad.sum())
qimw commented 5 years ago

same problem have you solved that @nationalflag @clcarwin @huaifeng1993

KaiLv69 commented 2 years ago

same problem

WZLHQ commented 1 year ago

If you guys find that the loss using CE-loss is much lower than that using Focal-loss, you can try like below: logpt=-F.cross_entropy(input,target.view(-1),reduction="none") the item " reduction="none" " is important.