Closed awaelchli closed 2 years ago
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
No!
Great idea! Let's do it.
š Feature
Check slurm environment settings and print warnings if needed.
Motivation
If SLURM srun variables are set incorrectly, the processes can hang and the user will not know why.
Examples:
15141
Pitch
In the SLURM cluster environment, check for example:
Alternatives
Do nothing. Users will keep submitting issues :)
Additional context
If you enjoy Lightning, check out our other projects! ā”
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @borda @awaelchli