Open armstrongli opened 10 months ago
Not sure I understand the use case. Learner doesn't serve API, so demoting a member will break clients connected to it similar to taking the node down. Why just not remove/shutdown problematic member?
I expect the problem might be related to problematic member, reconnecting to cluster with higher term, forcing a unnecessary leader election and allowing faulty member to become leader.
Which etcd version you are running? Problem of re-connecting member repeatedly triggering leader election has been already solved by --pre-vote
flag (default in v3.5).
What would you like to be added?
add demote subcommand support in etcdctl. e.g.
the expected output is to convert an existing etcd member back to a learner
Why is this needed?
we maintain over 500 etcd clusters and encountered such issues lots of times. etcd can't survive from unstable networks, through etcd can survive from fully network partition. the case is: one or several(minority) members have unstable networks.
during leader election, etcd cluster can't surve any requests. frequent leader elections bring trouble to the cluster, as well as the upper servers(k8s apiservers).
a learner doesn't vote in leader election and won't bother the existing clusters. so we can auto. demote the member from cluster member to a cluster learner and buy us time to boot a new member and decom the bad one(s).