Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
28.32k
stars
3.38k
forks
source link
If ModelCheckpoint and Validation occur on the same step, validation should run first, reopen #17417
Open
dmitrypenzar1996 opened 1 year ago
Bug description
The issue described here, is still unsolved https://github.com/Lightning-AI/lightning/issues/7694
The consequences of this issue is that n-th model is saved based on metrics from the n-1th step
What version are you seeing the problem on?
No response
How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response
cc @carmocca @awaelchli