The compute_groups may lead to unexpected behavior

YicunDuanUMich commented 3 months ago

🐛 Bug

When I use MetricCollection to store similar metrics, the compute_groups feature will automatically merge the states without any warning. I have two similar metrics "star classification accuracy" and "galaxy classification accuracy" whose only difference is a string attribute source_type_filter ("star" v.s. "galaxy") which instructs them to conduct different filter behaviors. During validation, i find that "star classification accuracy" is always equal to "galaxy classification accuracy" because MetricCollection merges their states.

To Reproduce

I attach my hydra config file here to show my env:

 metrics:
        _target_: torchmetrics.MetricCollection
        _convert_: "partial"
        metrics:
          source_type_accuracy:
            _target_: bliss.encoder.metrics.SourceTypeAccuracy
            flux_bin_cutoffs: [200, 400, 600, 800, 1000]
          source_type_accuracy_star:
            _target_: bliss.encoder.metrics.SourceTypeAccuracy
            flux_bin_cutoffs: [200, 400, 600, 800, 1000]
            source_type_filter: "star"
          source_type_accuracy_galaxy:
            _target_: bliss.encoder.metrics.SourceTypeAccuracy
            flux_bin_cutoffs: [200, 400, 600, 800, 1000]
            source_type_filter: "galaxy"

Expected behavior

The default value of compute_groups in MetricCollection is False

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): 0.11.3
Python & PyTorch Version (e.g., 1.0): 2.0
Any other relevant information such as OS (e.g., Linux): Linux

Additional context

github-actions[bot] commented 3 months ago

Hi! thanks for your contribution!, great first issue!

SkafteNicki commented 2 months ago

Hi @YicunDuanUMich, sorry for the late reply. I am pretty sure that the bug you are seeing was fixes in later versions of torchmetrics. In particular I think this PR probably contains the solution for your issues: https://github.com/Lightning-AI/torchmetrics/pull/2571. Just as an example, here is a script that initializes the classification metric precision with both average="macro" and average="weighted" e.g. two metric where there states are the same and they only differ how values are aggregated in the end.

import torchmetrics
import torch

collection = torchmetrics.MetricCollection({
    "micro_precision": torchmetrics.Precision(task="multiclass", average='weighted', num_classes=3),
    "macro_precision": torchmetrics.Precision(task="multiclass", average='macro', num_classes=3),
})

for _ in range(3):
    x = torch.randn(10, 3).softmax(dim=1)
    y = torch.randint(0, 3, (10,))
    collection.update(x, y)
    out = collection.compute()
    print(out)

In v0.11.3 of torchmetrics I am seeing somewhat the same behavior you are describing, that the second metric is not being updated correctly. However in the newest version of torchmetrics things are working as expected. Therefore, please update to the newest version of torchmetrics. Closing issue.

Lightning-AI / torchmetrics