Open matt-chan opened 2 years ago
@matt-chan instead of excluding nodes from partition, I'm thinking on have a parameter to define how many cores/VM should always been on for each queue/partitions.
Hi Xavier, Yes I think that behavior would be best if we could achieve it, but I'm not certain it's possible. I originally tried to make a PR before making this feature request but I couldn't figure out how to do it. I'm not sure Cyclecloud and slurm have that functionality.
Your team is definitely better at this stuff than I am. If you can figure it out, it would be a great feature! Just to make sure we're on the same page, it is the number of idle VMs we want to keep in each queue right? So if there are 5 jobs, and the idle setting is 2 VMs, there should be 7 VMs running in total?
@matt-chan the way it works is that it will always keep x number of nodes always running. If they are filled by jobs then new nodes will be added up to the quota define for that queue/partition. I'm afraid that having always yy number of nodes above the allocated ones is not possible today.
@xpillons I now implemented a simple solution for this. The following script is run as a cron job every 5 minutes on weekdays (I have it on ondemand, but I guess it should move to the scheduler VM). I think it is self-explanatory
#!/bin/bash
# Usage: ./warmup-queues.sh viz hb2la
set -e
# SLURM node states & state flags on AZ-HOP
# idle VM allocated and idling
# idle~ VM not allocated from Azure
# idle# VM being allocated from Azure
# idle% VM being powered down
# mix Some CPUs allocated but not all
for queue in "$@"; do
available=`sinfo -p $queue --states=mix,idle --noheader | grep -v idle~ | grep -v idle# | grep -v idle% | wc -l`
allocating=`sinfo -p $queue --states=idle --noheader | grep idle# | wc -l`
if [[ $available == 0 && $allocating == 0 ]]; then
echo "Allocating 1 node on queue $queue"
srun --partition $queue bash > /dev/null 2>&1 &
PID=$!
sleep 2
set +e
kill $PID
set -e
elif [[ $available -gt 0 ]]; then
# "touch" one available node so that it won't be deallocated by slurm after timeout
set +e
srun --partition $queue "exit" > /dev/null 2>&1 &
set -e
fi
done
The admin can set a warmup
field on any queue in config.yml
. These queues are passed as arguments to the cronjob.
Let me know if you are interested in a PR for this
P.S. This creates one extra job every 5 minutes per queue. There may be more "official" ways of doing this via the slurm config https://slurm.schedmd.com/power_save.html#config but it already does the job
@ltalirz sounds a great start. Need to be run on the scheduler. Also ideally it should read the config file and pickup partition names, number of nodes to allocate.
Also ideally it should read the config file and pickup partition names
This is already how it works; the cronjob is
- name: set up cronjob for queue warmup
cron:
name: "queue-warmup"
job: "/usr/local/sbin/queue-warmup.sh {{ warmup_queues | map(attribute='name') | join(' ') }}"
minute: "*/5"
weekday: 1-5
user: "root"
state: "present"
vars:
warmup_queues: "{{ queues | selectattr('warmup', 'defined') | selectattr('warmup', 'equalto', true) }}"
Modification for keeping >1 warm nodes will require some modifications (more touching of nodes needed) but should be doable I guess. In practice, 1 idling node (at all times) is already a great improvement in user experience and often all you need.
In what area(s)?
Describe the feature
Hi Xavier,
It would be great if we could set a few test queues in azhop. This would let our users run quick jobs without having to wait for node spinup time.
Currently, I'm approximating the behavior by setting a large idle time on some queues, but would be nice to have a setting which actually keeps the nodes alive forever using the slurm setting here: https://learn.microsoft.com/en-us/azure/cyclecloud/slurm?view=cyclecloud-8#excluding-a-partition. Also another common feature of these test queues is a short job timelimit. I don't see a way to set this from cyclecloud right now though, even though it is in /etc/slurm/cyclecloud.conf.
Thanks! Matt