Closed HenrikBengtsson closed 1 year ago
If one doesn't specify --nodes=1, then the job might get allocated slots across multiple machines, e.g.
--nodes=1
$ sbatch --ntasks=65 slurm_envs.sh Submitted batch job 9394 $ squeue -u "$USER" JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 9394 cbc slurm_en henrik R 0:07 2 c4-n[12-13] $ grep -E "^(SLURM|SBATCH)_" slurm-9394.out SLURM_CPUS_ON_NODE=64 SLURM_GTIDS=0 SLURM_JOB_ACCOUNT=cbi SLURM_JOB_CPUS_PER_NODE=64,1 SLURM_JOB_GID=509 SLURM_JOB_ID=9394 SLURM_JOBID=9394 SLURM_JOB_NAME=slurm_envs SLURM_JOB_NODELIST=c4-n[12-13] SLURM_JOB_NUM_NODES=2 SLURM_JOB_PARTITION=cbc SLURM_JOB_QOS=max_cores SLURM_JOB_UID=581 SLURM_JOB_USER=henrik SLURM_LOCALID=0 SLURM_MEM_PER_NODE=2048 SLURM_NNODES=2 SLURM_NODE_ALIASES=(null) SLURM_NODEID=0 SLURM_NODELIST=c4-n[12-13] SLURM_NPROCS=65 SLURM_NTASKS=65 SLURM_PRIO_PROCESS=0 SLURM_PROCID=0 SLURM_SUBMIT_DIR=/c4/home/henrik/tests-slurm SLURM_SUBMIT_HOST=c4-dev1 SLURM_TASK_PID=46543 SLURM_TASKS_PER_NODE=64,1 SLURM_TOPOLOGY_ADDR=c4-n12 SLURM_TOPOLOGY_ADDR_PATTERN=node
Issue
If one doesn't specify
--nodes=1
, then the job might get allocated slots across multiple machines, e.g.Action
--nodes=1
--nodes=1
from the beginning