StanfordLegion / sapling-guide

Sapling User Guide
3 stars 0 forks source link

Use salloc if you want to srun while on a node #1

Open suranap opened 1 week ago

suranap commented 1 week ago

I used srun to hop into a bash shell on a GPU machine. Then I wanted to use srun.to launch 4 processes on this same machine. It just hangs. Looks like srun reserves the whole node, and then further calls to srun are stuck. So this is a use case for salloc, and that's how I do stuff on Frontier/Perlmutter. However, on those systems salloc will jump into the machine also. That's convenient.

https://github.com/StanfordLegion/sapling-guide/blob/4a2dc09b4fc871a06a14425e3118b476aefc8024/README.md?plain=1#L63-L78

elliottslaughter commented 1 week ago

You're right that you'd need to salloc and then srun inside of that if you want to do multiple jobs inside of an allocation.

Is there a specific request you're making or improvement you suggest? srun is the shortest one-line command, so it's what I generally recommend, and doing multiple jobs is generally a special case.

suranap commented 6 days ago

Sapling could match the behavior of salloc on HPCs by adding this to /etc/slurm.conf:

LaunchParameters=use_interactive_step

See here for more info. This is now the recommended way to do things.