Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
59 stars 43 forks source link

leave a comment about 0 capacity unused partitions #286

Closed ryanhamel closed 1 month ago

ryanhamel commented 1 month ago

When users hit 0 capacity issues, currently the only way for them to detect that is by checking the logs. This PR will leave the definition of the partition in azure.conf, but will comment it out with a comment that this particular partition/nodearray/vm_size has 0 capacity.

For example, I have 0 capacity for the ND40 here, this is what azslurm partitions writes out (and azslurm scale would put into /etc/slurm/azure.conf)

# The following partition has no capacity! htc - htc - Standard_ND40rs_v2 # PartitionName=htc Nodes= Default=NO DefMemPerCPU=16343 MaxTime=INFINITE State=UP # Nodename=[] Feature=cloud STATE=CLOUD CPUs=40 ThreadsPerCore=1 RealMemory=653721 Gres=gpu:8