Lightning-AI / torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.
https://lightning.ai/docs/torchmetrics/
Apache License 2.0
2.15k stars 409 forks source link

`GeneralizedDiceScore` yields 0 scores when using `per_class=True` for samples where class is not present #2846

Open nkaenzig opened 2 days ago

nkaenzig commented 2 days ago

🐛 Bug

The current implementation of GeneralizedDiceScore yields scores of 0.0 for samples that don't contain a particular class when calculating class-wise metrics via per_class=True.

This leads to very low dice scores, particularly for rare classes and therefore makes the dice scores between classes incomparable.

To Reproduce

The following code sample calculates class-wise scores of tensor([0.2500, 0.2500, 0.0000]), even though all the predictions match the targets:

Code sample ```python import torch from torchmetrics.segmentation import GeneralizedDiceScore from torchmetrics.segmentation import DiceScore N_SAMPLES = 4 N_CLASSES = 3 target = torch.full((N_SAMPLES, N_CLASSES, 128, 128), 0, dtype=torch.int8) preds = torch.full((N_SAMPLES, N_CLASSES, 128, 128), 0, dtype=torch.int8) target[0, 0], preds[0, 0] = 1, 1 target[2, 1], preds[2, 1] = 1, 1 generalized_dice = GeneralizedDiceScore(num_classes=3, per_class=True, include_background=True) print(generalized_dice(preds, target)) ```

Expected behavior

I'd expect the above code sample to return [1.0, 1.0, nan] for the class-wise scores (nan for the third class, given that this class is not present in any of the samples, therefore returning a 1.0 score might also be misleading). Also, samples where the class doesn't occur should not contribute to the dice score of that class.

Environment

Additional context

Very similar to issue #2850.

github-actions[bot] commented 2 days ago

Hi! thanks for your contribution!, great first issue!