Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
56 stars 43 forks source link

Fix manual suspend and avoid KeepAlive=true resume/resume_fail loop via new azslurm return_to_idle command #231

Closed ryanhamel closed 2 months ago

ryanhamel commented 5 months ago

Fixes issues with manually suspending nodes that do not use nodename==hostname. Also fixes KeepAlive issues where the nodes would resume/resume_failed in a loop forever. Adds azslurm return_to_idle command to help with this, rather than the pure shell implementation.