Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.02k stars 3.36k forks source link

Metrics and multiple validation/test dataloaders #12765

Open pietrolesci opened 2 years ago

pietrolesci commented 2 years ago

As discussed on slack with @justusschock, it would be nice to make explicit the behavior of torchmetrics when used with multiple dataloaders.

From @justusschock:

when using the module-based interface, PL leaves the aggregation to TM since many metrics are in fact non-trivial to aggregate properly. TM however, is designed to also work independently of PL. So it only updates the states when you tell it to and computes results based on its internal states. It does not know about the dataloader concept at all. What PL does there internally, is that it just caches the metric object to log and for multiple dataloaders it would still cache the same object (since you don’t have different objects per loader). The metric’s internal state however would be global for all of the loaders since it is the same

cc @borda @rohitgr7 @carmocca @edward-io @ananthsub @kamil-kaczmarek @Raalsky @Blaizzy

rohitgr7 commented 2 years ago

What PL does there internally, is that it just caches the metric object to log and for multiple dataloaders it would still cache the same object (since you don’t have different objects per loader). The metric’s internal state however would be global for all of the loaders since it is the same

how are you planning to make it work with multiple dataloaders? since the states are reset on epoch end and epoch end is triggered only once and when all the dataloaders are processed. Can you share more details?

justusschock commented 2 years ago

@rohitgr7 this isn't about changing the behavior but making it more explicit in the docs :)

rohitgr7 commented 2 years ago

got it. btw I remember adding them here: https://torchmetrics.readthedocs.io/en/stable/pages/lightning.html#common-pitfalls.