chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 421 forks source link

Default to slurm based launchers when salloc/srun is available #26170

Open jabraham17 opened 3 weeks ago

jabraham17 commented 3 weeks ago

I recently had some issues running on a slurm-based IB system. The problem was I was using the default launcher for IB when COMM=gasnet, which is gasnetrun_ibv. However, that launcher requires you to make your own slurm allocation using salloc (see https://chapel-lang.org/docs/main/usingchapel/launcher.html#using-any-ssh-based-launcher-with-slurm). The solution was either to manually make salloc calls, or just use slurm-gasnetrun_ibv which handles that for you.

This is a simple solution, but why is it necessary? It seems like if we can detect a slurm based system, we should default to a slurm based launcher. This led me to investigate util/chplenv/chpl_launcher.py where we do actually have that detection, but only on cray-cs and hpe-apollo.

I went looking though the history for this and found two PRs making this change, https://github.com/chapel-lang/chapel/pull/17314 for gasnet and https://github.com/chapel-lang/chapel/pull/17305 for other comm layers. Based on these PR messages, we only default to slurm based launchers on cray/hpe systems because it was messing with internal testing systems that want to use a different launcher but have slurm.

This feels like optimizing for the wrong case, we should default to what is common for users.

In my opinion this is a simple change, just remove the checks for the target platform and adjust automated testing systems as needed. However, there may be other cases I am not thinking of where we would not want to default to a slurm-based launcher.

bradcray commented 3 weeks ago

This sounds great to me, thanks for investigating, Jade!