dmizr / phuber

[Re] Can gradient clipping mitigate label noise? (ML Reproducibility Challenge 2020)
https://openreview.net/forum?id=TM_SgwWJA23
MIT License
14 stars 6 forks source link

modifying loss functions for binary segmentation #62

Closed K-D-Gallagher closed 1 year ago

K-D-Gallagher commented 1 year ago

Hello, I realize this is a bit beyond the scope of your project, but I was wondering whether I could run some code modifications past you. I'm trying to use these loss functions for binary semantic segmentation.

TLDR, I wanted to ask what exactly is happening with the line: p = p[torch.arange(p.shape[0]), target] As well as check why we need the "1 - p" in the final loss term: loss = (1 - p ** self.q) / self.q. I think it's making my loss negative.

--------------

modification1

--------------

Because I am not doing multi class classification, but binary pixel level classification, I replaced: p = self.softmax(input) p = p[torch.arange(p.shape[0]), target]

with: p = self.bce(input,target) where self.bce = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([weight],device = "cuda:0"), reduction='none')

I'm not sure if I'm correct about what is happening with p[torch.arange(p.shape[0]), target] and whether it can be replaced with the binary cross entropy loss.

--------------

modification2

--------------

The modifications above got things running, but resulted in negative loss, so I also replaced: loss = (1 - p self.q) / self.q with: loss = (p self.q) / self.q

Finally, here is the GeneralizedCrossEntropy class with my modifications:

#---------------------------------------------------------
class GeneralizedCrossEntropy(nn.Module):
#---------------------------------------------------------
    """Computes the generalized cross-entropy loss, from `
    "Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels"
    <https://arxiv.org/abs/1805.07836>`_
    Args:
        q: Box-Cox transformation parameter, :math:`\in (0,1]`
    Shape:
        - Input: the raw, unnormalized score for each class.
                tensor of size :math:`(minibatch, C)`, with C the number of classes
        - Target: the labels, tensor of size :math:`(minibatch)`, where each value
                is :math:`0 \leq targets[i] \leq C-1`
        - Output: scalar
    """

    def __init__(self, q: float = 0.7, weight=None) -> None:
        super().__init__()
        self.q = q
        self.epsilon = 1e-9
        self.bce = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([weight],device = "cuda:0"), reduction='none')

    def forward(self, input: torch.Tensor, target: torch.Tensor) -> torch.Tensor:

        # p = self.softmax(input)
        # p = p[torch.arange(p.shape[0]), target]
        p = self.bce(input,target)
        p += self.epsilon
        # loss = (1 - p ** self.q) / self.q
        loss = (p ** self.q) / self.q

        return torch.mean(loss)
dmizr commented 1 year ago

Hi @K-D-Gallagher ,

The line p = p[torch.arange(p.shape[0]), target] selects, for each image in the batch, the predicted probability $p$ associated to that target. For example, if our input after softmax was p=[[0.3, 0.1, 0.6], [0.5, 0.4, 0.2]] and our targets were [2, 1], that line would return the array [0.6, 0.4].

In your case, for binary tasks, you could achieve the same result by returning $p$ if the target associated to the value is 1, and $1 - p$ otherwise, while keeping the rest of the code unchanged. E.g., assuming input is the unnormalized scores and target is a vector of the same shape as input with values $\in$ {0, 1}, you could replace the first two lines of the forward method with:

p = torch.sigmoid(input)
p = torch.where(p == 1, p, 1 - p)