Open elgalu opened 3 years ago
Hi @elgalu. We don't currently offer HA support for the head node. Manual intervention is required if it goes down. I'm marking this as a feature request.
See #1447. Slurm supports up to 3 controllers and will fail over to the extra controllers if there is a failure. I agree that this is an important feature.
How does aws-parallelcluster provide high availability on the head node?
Couldn't find if the master goes down which process will bring it back.