how to automaticaly reset the metrics, when I have tested one of the multiple dataloaders?

Telephone1024 commented 2 years ago

🐛 Bug

I just noticed that when I tested my model with multiple dataloaders (which represent different datasets), the results of previous dataloaders will not be reset automaically. Thus all the results are wrong except the first one.

test both two dataloaders:	Test metric	1st Dataloader	2nd DataLoader
test/acc	0.8613571524620056	0.8557639122009277
test/auc	0.9369744658470154	0.9324582815170288

only test the second dataloder:

Test metric	2nd DataLoader
test/acc	0.6600000262260437
test/auc	0.7271063327789307

Obviously, for the first dataloader containing more samples, the results of the second dataloaders are affected by the first one.

Code sample

class Data(pl.LightningDataModule):    
    def test_dataloader(self):
        return [
            DataLoader(
                self.test_set_1,
                batch_size=32,
                num_workers=12, pin_memory=False,
            ), 
            DataLoader(
                self.test_set_2,
                batch_size=32,
                num_workers=12, pin_memory=False,
            )
        ]

class Model(pl.LightningModule):
    def __init__(self, training):
        super().__init__()
        self.save_hyperparameters()
        self.backbone = build_backbone(training.model_cfg)

        # metrics
        self.eval_acc = Accuracy(dist_sync_on_step=True)
        self.eval_auc = AUROC(num_classes=2, compute_on_step=False, dist_sync_on_step=True)

    def test_step(self, batch, batch_idx, dataloader_idx=0):
        inputs, targets = batch
        preds = self.backbone(inputs)
        self.eval_acc(preds, targets)
        self.eval_auc(preds, targets)
        self.log_dict(
            {'test/acc':self.eval_acc, 'test/auc':self.eval_auc},
            on_step=False, on_epoch=True, sync_dist=True,
            rank_zero_only=True
        )

Expected behavior

Whe the test procedure of one dataloder is over, the reset() method of torchmetrics will be called automatically. And the results of the next dataloader with different dataloader_idx will not be affected.

I'm a freshman to pytorch-lightning, are there any solutions to my problem?

Environment

package	version
python	3.9.13
pytorch	1.12.1
pytorch-lightning	1.7.7
torchmetrices	0.9.3

Additional context

I'll provide more details if necessary.

github-actions[bot] commented 2 years ago

Hi! thanks for your contribution!, great first issue!

SkafteNicki commented 2 years ago

Hi @Telephone1024, As stated under the common pitfalls section here: https://torchmetrics.readthedocs.io/en/stable/pages/lightning.html#common-pitfalls we recommend initializing a metric per dataloader when working with multiple datasets to make sure that states are not being mixed (as you are seeing).

Telephone1024 commented 2 years ago

Hi @Telephone1024, As stated under the common pitfalls section here: https://torchmetrics.readthedocs.io/en/stable/pages/lightning.html#common-pitfalls we recommend initializing a metric per dataloader when working with multiple datasets to make sure that states are not being mixed (as you are seeing).

thanks for your answer!

Lightning-AI / torchmetrics