The current implementation of GeneralizedDiceScore yields scores of 0.0 for samples that don't contain a particular class when calculating class-wise metrics via per_class=True.
This leads to very low dice scores, particularly for rare classes and therefore makes the dice scores between classes incomparable.
To Reproduce
The following code sample calculates class-wise scores of tensor([0.2500, 0.2500, 0.0000]), even though all the predictions match the targets:
I'd expect the above code sample to return [1.0, 1.0, nan] for the class-wise scores (nan for the third class, given that this class is not present in any of the samples, therefore returning a 1.0 score might also be misleading). Also, samples where the class doesn't occur should not contribute to the dice score of that class.
Environment
TorchMetrics version (if build from source, add commit SHA): 1.6.0
Python & PyTorch Version (e.g., 1.0): 3.11.10
Any other relevant information such as OS (e.g., Linux): macOS 15.1.1 (24B91)
🐛 Bug
The current implementation of
GeneralizedDiceScore
yields scores of0.0
for samples that don't contain a particular class when calculating class-wise metrics viaper_class=True
.This leads to very low dice scores, particularly for rare classes and therefore makes the dice scores between classes incomparable.
To Reproduce
The following code sample calculates class-wise scores of
tensor([0.2500, 0.2500, 0.0000])
, even though all the predictions match the targets:Code sample
```python import torch from torchmetrics.segmentation import GeneralizedDiceScore from torchmetrics.segmentation import DiceScore N_SAMPLES = 4 N_CLASSES = 3 target = torch.full((N_SAMPLES, N_CLASSES, 128, 128), 0, dtype=torch.int8) preds = torch.full((N_SAMPLES, N_CLASSES, 128, 128), 0, dtype=torch.int8) target[0, 0], preds[0, 0] = 1, 1 target[2, 1], preds[2, 1] = 1, 1 generalized_dice = GeneralizedDiceScore(num_classes=3, per_class=True, include_background=True) print(generalized_dice(preds, target)) ```Expected behavior
I'd expect the above code sample to return
[1.0, 1.0, nan]
for the class-wise scores (nan
for the third class, given that this class is not present in any of the samples, therefore returning a 1.0 score might also be misleading). Also, samples where the class doesn't occur should not contribute to the dice score of that class.Environment
1.6.0
3.11.10
Additional context
Very similar to issue #2850.