Open suranap opened 1 week ago
You're right that you'd need to salloc
and then srun
inside of that if you want to do multiple jobs inside of an allocation.
Is there a specific request you're making or improvement you suggest? srun
is the shortest one-line command, so it's what I generally recommend, and doing multiple jobs is generally a special case.
I used
srun
to hop into a bash shell on a GPU machine. Then I wanted to usesrun
.to launch 4 processes on this same machine. It just hangs. Looks likesrun
reserves the whole node, and then further calls tosrun
are stuck. So this is a use case forsalloc
, and that's how I do stuff on Frontier/Perlmutter. However, on those systems salloc will jump into the machine also. That's convenient.https://github.com/StanfordLegion/sapling-guide/blob/4a2dc09b4fc871a06a14425e3118b476aefc8024/README.md?plain=1#L63-L78