Open jhrmnn opened 4 months ago
Can you check for any messages in /opt/azurehpc/slurm/logs/autoscale.log
and shutdown.log
in the same directory?
Also are the VM's showing up in cyclecloud? Can you verify if you set KeepAlive
on them through cyclecloud UI?
CycleCloud version: 8.6.2-3276 Slurm version: 22.05.11
Autoscaling down after the job queue gets empty worked for me successfully numerous times until it didn't after full occupancy of the cluster lasting several days. All jobs were then killed, the compute nodes transitioned to
idle~
, but CycleCloud didn't deprovision the VMs. How can I investigate the cause of this behavior?