Currently if a node crashed or shutdown, node status will be unready (failed status). Right after node failure, etcd pod wouldn't be deleted, its phase would be still Running, and status would become unready or unknown.
After some timeout (default 5m), etcd pods would get evicted. Check:
Even though node unreadiness might indicate etcd pods on it being unhealthy, but this is not fine-granular enough: kubelet could have bug but etcd process is fine, node is fine but etcd process could crash, etc.
For above reasons, we need application level health checking to detect unhealthy etcd processes and add custom toleration policies for unhealthy members.
We will need to add custom toleration policies like how long can I tolerate an etcd member being unhealthy and what cases could be tolerated (e.g. data corruption could not). After toleration period, etcd-operator would replace unhealthy members onto other nodes.
Currently if a node crashed or shutdown, node status will be unready (failed status). Right after node failure, etcd pod wouldn't be deleted, its phase would be still Running, and status would become unready or unknown.
After some timeout (default 5m), etcd pods would get evicted. Check:
--pod-eviction-timeout
: https://kubernetes.io/docs/reference/generated/kube-controller-manager/There are two problems here: a node could have restarted before eviction, and node unready status is not a good indication of etcd pod health.
Node restart
After node restart, etcd pod would just die because it uses empty-dir to store data and empty-dir is not persistent in this case. See issue https://github.com/coreos/etcd-operator/issues/1839.
We could solve this store data on more persistent volume. Check:
Fine-grained health checking and timeout
Even though node unreadiness might indicate etcd pods on it being unhealthy, but this is not fine-granular enough: kubelet could have bug but etcd process is fine, node is fine but etcd process could crash, etc.
For above reasons, we need application level health checking to detect unhealthy etcd processes and add custom toleration policies for unhealthy members.
A simple readiness probe has been added in https://github.com/coreos/etcd-operator/issues/1320 which could be used as health check.
We will need to add custom toleration policies like how long can I tolerate an etcd member being unhealthy and what cases could be tolerated (e.g. data corruption could not). After toleration period, etcd-operator would replace unhealthy members onto other nodes.