I would like to be able to assign multiple batch pools (== VM sizes) to a single SLURM partition. This way, SLURM should be able to do resource management using the --mem and --cpus-per-task flags. Currently attempting submit an sbatch/srun job to using these flags with the unmodified slurm.conf generated by shipyard fails (eg, srun: error: Unable to allocate resources: Requested node configuration is not available).
Currently jobs can only be targeted to a specific batch pool via --partition or --constraint flags, since the NodeName= lines in the generated slurm.conf don't contain resource specifications like CoresPerSocket or RealMemory.
I'd like to be able to use a configuration like this. Two (or more) pre-created batch pools x32core64G and x8core16G (VM types STANDARD_F32s_v2 and STANDARD_F8s_v2) mapping to a single SLURM partition mypartition (trimmed example):
I can manually add CoresPerSocket or RealMemory values to slurm.conf on the login and controller node, restart slurmctld and submit jobs using --mem and --cpus-per-task.
However I find that with a default single partition (mypartition) mapping to multiple batch pools, only the final batch pool (x8core16G in this case) ever receives jobs and autoscales. I believe this is because the Table shipyardslurm only holds a single BatchPoolId per partition (the final one defined in slurm.yaml/slurm.conf), so only a single batch pool ever autoscales in this case ?
I'd like to be able to use a configuration like this with a single default partition, multiple batch pools, and have SLURM / shipyard automatically assign jobs to the correct node type / batch pool based on the --cpu-per-task and --mem flags.
Describe Preferred Solution
Shipyard should query VM specifications for each batch pool and add CorePerSocket and RealMemory (or similar) values to each NodeName line in the generated slurm.conf.
Make the autoscaling / powersaving scripts (eg /var/batch-shipyard/slurm.py, the shipyardslurm Table partition to batch pool mappings) work when a partition maps to multiple batch pools. Unsure of exactly the changes required to make this part work.
Feature Request Description
I would like to be able to assign multiple batch pools (== VM sizes) to a single SLURM partition. This way, SLURM should be able to do resource management using the
--mem
and--cpus-per-task
flags. Currently attempting submit ansbatch
/srun
job to using these flags with the unmodifiedslurm.conf
generated by shipyard fails (eg,srun: error: Unable to allocate resources: Requested node configuration is not available
).Currently jobs can only be targeted to a specific batch pool via
--partition
or--constraint
flags, since theNodeName=
lines in the generatedslurm.conf
don't contain resource specifications likeCoresPerSocket
orRealMemory
.I'd like to be able to use a configuration like this. Two (or more) pre-created batch pools
x32core64G
andx8core16G
(VM typesSTANDARD_F32s_v2
andSTANDARD_F8s_v2
) mapping to a single SLURM partitionmypartition
(trimmed example):I can manually add
CoresPerSocket
orRealMemory
values toslurm.conf
on the login and controller node, restartslurmctld
and submit jobs using--mem
and--cpus-per-task
.However I find that with a default single partition (
mypartition
) mapping to multiple batch pools, only the final batch pool (x8core16G
in this case) ever receives jobs and autoscales. I believe this is because the Tableshipyardslurm
only holds a singleBatchPoolId
per partition (the final one defined in slurm.yaml/slurm.conf), so only a single batch pool ever autoscales in this case ?I'd like to be able to use a configuration like this with a single default partition, multiple batch pools, and have SLURM / shipyard automatically assign jobs to the correct node type / batch pool based on the
--cpu-per-task
and--mem
flags.Describe Preferred Solution
Shipyard should query VM specifications for each batch pool and add
CorePerSocket
andRealMemory
(or similar) values to eachNodeName
line in the generatedslurm.conf
.Make the autoscaling / powersaving scripts (eg
/var/batch-shipyard/slurm.py
, theshipyardslurm
Table partition to batch pool mappings) work when a partition maps to multiple batch pools. Unsure of exactly the changes required to make this part work.