Azure / az-hop

The Azure HPC On-Demand Platform provides an HPC Cluster Ready solution
https://azure.github.io/az-hop/
MIT License
62 stars 52 forks source link

Change default SlurmdTimeout value #1897

Open xpillons opened 3 months ago

xpillons commented 3 months ago

default value is set to 300s which can end in to node status losing communication and waiting too long before powering down the node. Reducing to 60s will recycle dead nodes faster. This should fix #1895