ConSol-Lab / gourd

a command-line tool for configuring, running, and analysing algorithm comparison experiments on supercomputers
4 stars 1 forks source link

Allow for scheduling slurm sub-jobs #42

Open RobbinBaauw opened 3 days ago

RobbinBaauw commented 3 days ago

Delftblue has quite a strict limit on #jobs that can be concurrently scheduled, which is limiting for single-threaded programs (as many CPUs are available per job). Slurm has a feature for this, in that you can use srun within an sbatch to schedule multiple tasks within a job (only relevant fields shown):

#!/bin/bash
#SBATCH --array=0-15
#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=100M
#SBATCH --cpus-per-task=1

set -x

for i in $(seq 1 8); do
    srun -c1 -n1 --exact --mem-per-cpu=100M my_exec $SLURM_ARRAY_TASK_ID $i &
done

wait

This will schedule 8 sub-jobs in the allocated resources from the sbatch without running into the concurrent job limit.

Would it be possible to implement this feature in Gourd? E.g. add the ntasks config field such that the chunks are automatically also divided over jobs. Thanks in advance!

RobbinBaauw commented 3 days ago

I have quickly implemented a version that seems to work: https://github.com/ConSol-Lab/gourd/compare/main...RobbinBaauw:gourd:schedule-multiple-runs.