Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.32k stars 3.38k forks source link

If ModelCheckpoint and Validation occur on the same step, validation should run first, reopen #17417

Open dmitrypenzar1996 opened 1 year ago

dmitrypenzar1996 commented 1 year ago

Bug description

The issue described here, is still unsolved https://github.com/Lightning-AI/lightning/issues/7694

The consequences of this issue is that n-th model is saved based on metrics from the n-1th step

What version are you seeing the problem on?

No response

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment ``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```

More info

No response

cc @carmocca @awaelchli

Borda commented 1 year ago

@dmitrypenzar1996 could you pls share what version we are talking about?