coreos / container-linux-update-operator

A Kubernetes operator to manage updates of Container Linux by CoreOS
Apache License 2.0
209 stars 49 forks source link

operator: pause reboots when active alerts are detected #158

Open lucab opened 7 years ago

lucab commented 7 years ago

Currently update-operator reboots nodes as soon as updates are available. https://github.com/coreos/container-linux-update-operator/issues/82 tracks adding support for a user-configured maintenance window. On top of that, even inside a maintenance window there could be situations where reboots should be temporarily paused (e.g. when some critical/unplanned outage is happening).

This can be currently done by setting a reboot-paused annotation on specific nodes, however this is a manual operation and doesn't scale well cluster-wide.

It would be nice to let CLUO know about any existing AlertManager in the cluster and check for specific active alerts before proceeding. @brancz suggested that we could:

For clarity, this should be completely orthogonal to maintenance window configuration.