Add watchdog daemon to node states

clusterinthecloud / ansible

Ansible config for Cluster in the Cloud

https://cluster-in-the-cloud.readthedocs.io

MIT License

10 stars 26 forks source link

Add watchdog daemon to node states #64

Open milliams opened 5 years ago

milliams commented 5 years ago

We've seen a few times the state has been mismatched between the real state of the VMs from the cloud provider's perspective and what Slurm thinks is true.

I think that the simplest solution to this is a daemon which runs on the management node to constantly check for consistency and correct anything that it can.

milliams commented 4 years ago

This is now in place. It currently does not perform any state fixing, it just reports on unmatched states. The code is in clusterinthecloud/python-citc.