facebookincubator / submitit

Python 3.8+ toolbox for submitting jobs to Slurm
MIT License
1.3k stars 125 forks source link

Can I use torchrun with submitit? #1764

Open vasudev-sharma opened 8 months ago

vasudev-sharma commented 8 months ago

I've a usecase where on a SLURM cluster, I am planning to use torchrun or torch.distributed with submitit The purpose is to do distributed training using torchrun or torch.distributed with submitit + PyTorch Lightning

How should I go ahead with it?

Thanks for the help in advance!!!

crewtool commented 4 days ago

Hey @vasudev-sharma, have you found a solution for this setup? I'm also interested in this. Any insights you could share would be greatly appreciated!