Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.92k stars 3.34k forks source link

Allow returning of test results from Trainer.test #7653

Open Rizhiy opened 3 years ago

Rizhiy commented 3 years ago

🚀 Feature

Currently, Trainer.test only returns logged metrics, this is very limiting.

Motivation

I have a lot of different results which are produced by tests, including figures and sample predictions. At present, I cannot return them from test since it only allows returning of logged values which it seems must be scalars.

As a workaround I have to separately predict outputs, get targets and then match them together, which is bug-prone.

Pitch

Trainer.test should return output of Module.test_epoch_end.

Alternatives

N/A

Additional context

N/A

cc @borda @tchaton @justusschock @awaelchli @rohitgr7

Rizhiy commented 3 years ago

I figured another workaround: I can save results to a file in test_epoch_end and then load them where I need it, but this is ugly.

XiaomoWu commented 3 years ago

@Rizhiy Agree. By the way, when you generate results from test_epoch_end, are your results split into N parts where N is the number of GPUs? I find test_epoch_end only collects results from one GPU.

Rizhiy commented 3 years ago

@XiaomoWu You can use self.all_gather to get results from different devices onto one.

awaelchli commented 3 years ago

Is the Trainer.predict() API what you are looking for? It will gather your results across all GPUs and by default returns them in a list of dictionaries.

One can also optionally implement predict_step on the LM. You could for example move your prediction code from test_step to predict_step. Or you could call test_step() in predict_step()

Rizhiy commented 3 years ago

@awaelchli No, I need results of test itself, e.g. metric & figures. To calculate that I need access to targets in easy to use manner, so inside test_*.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

versatran01 commented 3 years ago

I found this a bit annoying too. I ended up just saving all my metrics to some random member of my LM in test_epoch_end. Something like

def test_epoch_end(self, outputs):
    self.test_metrics = compute_metrics()
MohammedAljahdali commented 3 years ago

Is the Trainer.predict() API what you are looking for? It will gather your results across all GPUs and by default returns them in a list of dictionaries.

One can also optionally implement predict_step on the LM. You could for example move your prediction code from test_step to predict_step. Or you could call test_step() in predict_step()

I have a use case where on the test dataset I would do extra metrics calculations, as well as, logging the prediction and the correct target (They are strings), I would like to be able to return these things from test_epoch_end intrainer.test().

Another option, would be to allow self.log(.... reduce_fx=None) or self.log(.... reduce_fx=lambda x: x)

tchaton commented 2 years ago

Dear @Rizhiy,

Yes, PyTorch Lightning provides minimal returns types from the trainer.test function. The primary limitation being that it won't work with all accelerators/plugins, Lightning provides only supports for metrics. The users would have the compute / save extra data directly on test_epoch_end.

Best, T.C

carmocca commented 2 years ago

We cannot provide an automated of gathering whatever the user provides. The closest thing we have is predict as described in the above comment: https://github.com/PyTorchLightning/pytorch-lightning/issues/7653#issuecomment-848024353

If we were to find a solution to this problem, it would likely not be from whatever is returned from test_epoch_end as overriding that hook means outputs are kept in memory but we cannot know whether this is desired by the user.

An example like the one described in https://github.com/PyTorchLightning/pytorch-lightning/issues/7653#issuecomment-883727889 is inefficient because compute_metrics does not use the outputs hook input, doing so in on_test_epoch_end is more memory efficient.

Additionally one might want to return data generated from a different hook, so choosing test_epoch_end could be limiting.

In my opinion, the alternative suggested of saving this data in the model and optionally all_gathering it is fine.