Closed dsm-72 closed 1 year ago
Hey @dsm-72 Do you have a runnable piece of code you could share. I unfortunately can't guess what's wrong here without looking at the code. I suggest that you try to disable as many features/code as possible to narrow down where the issue is.
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
Bug description
Was trying to train a model with
pl.Trainer
. It goes for a few epochs, but after literally 2/3 epochs it kept freeze the kernel (couldn't even kill it). So I setmax_time={'minutes':2}
(following the documentation examples) and after 5 minutes it is still "going" strong. This happens on both CPU, and GPU (I tried on a 1080ti and a 3090).How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response