Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.91k stars 3.34k forks source link

validation_step output shouldn't be stored if validation_epoch_end isn't overriden #8583

Closed tchaton closed 3 years ago

tchaton commented 3 years ago

Hi @tchaton. I'm facing this memory leak. Returning an output in validation_step wil always be stored although validation_epoch_end isn't defined. You can test it removing validation_epoch_end from the BoringModel. For me, it's an unexpected behavior as I expect to behave validation loop as the training loop. That it's, if you don't want to store the outputs, do not override train_epoch_end hook.

In my case, I had a callback to compute the metrics and log validation outputs to an external service. So, I return a dictionary in validation_step and implement callbacks that overrides on_validation_batch_end and on_validation_epoch_end. This causes that outputs are stored --> memory leak.

Originally posted by @hal-314 in https://github.com/PyTorchLightning/pytorch-lightning/issues/8453#issuecomment-887508240

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!