Open dtmoodie opened 2 years ago
A rank is hanging. Can you create a reproducible snippet by adapting this script? https://github.com/Lightning-AI/lightning/blob/master/examples/pl_bug_report/bug_report_model.py
Also, how many devices are you using? I see your machine has 2 different GPU types. Does it happen if you only use the RTXs?
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
Bug description
I'm using manual optimization to work with two datasets for multi-task learning. Due to memory usage limitations, I want to do a forward and backward pass with a batch from one dataset, then a forward and backward pass with the other dataset.
When just enabling manual optimization on one dataset, my training process freezes on step 2 if I log scalars in the on_after_backwards call with sync_dist=True for the logging call.
How to reproduce the bug
Error messages and logs
Environment
More info
No response
cc @awaelchli @rohitgr7 @akihironitta @carmocca @edward-io @ananthsub @Blaizzy