facebookincubator / submitit

Python 3.8+ toolbox for submitting jobs to Slurm
MIT License
1.26k stars 120 forks source link

Support Slurm Heterogeneous Job #1741

Open sunshine-syz opened 1 year ago

sunshine-syz commented 1 year ago

Does submitit support Slurm Heterogeneous Job? If so, how can we submit heterogeneous job? If not, could you enhance the code to support it?

gwenzek commented 12 months ago

it's not supported atm, and from an API perspective I'm not sure how to handle this. currently the api assumes there is one configuration per job, while here you want several configurations in the same job. Not impossible, but also non trivial. What's the use case ? can you approximate this by starting two jobs ?

sunshine-syz commented 12 months ago

For example, if you want to start a distributed job running on two different GPUs or CPUs with different specs, and they need to communicate with each other and they cannot be started separately.

Here is one example: https://research-computing.git-pages.rit.edu/docs/slurm_tutorial_2.html https://slurm.schedmd.com/heterogeneous_jobs.html#submitting