etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.69k stars 9.75k forks source link

add demote support in etcdctl member subcommand #17180

Open armstrongli opened 10 months ago

armstrongli commented 10 months ago

What would you like to be added?

add demote subcommand support in etcdctl. e.g.

$ etcdctl member demote xxxxxx

the expected output is to convert an existing etcd member back to a learner

Why is this needed?

we maintain over 500 etcd clusters and encountered such issues lots of times. etcd can't survive from unstable networks, through etcd can survive from fully network partition. the case is: one or several(minority) members have unstable networks.

  1. it/they fails to get heartbeat from leader(let's call them B)
  2. B turns into Candidate role
  3. B raises raft proposal on new Term to the cluster
  4. all the other cluster members(leader and members) receive a new term in the cluster from B
  5. all the other cluster members(leader and members) follow the unstable member B
  6. B loses connection because of unstable network very soon
  7. other members start new term and get a new leader
  8. loop into step 1 and repeat 1~7 again and again

during leader election, etcd cluster can't surve any requests. frequent leader elections bring trouble to the cluster, as well as the upper servers(k8s apiservers).

a learner doesn't vote in leader election and won't bother the existing clusters. so we can auto. demote the member from cluster member to a cluster learner and buy us time to boot a new member and decom the bad one(s).

serathius commented 9 months ago

Not sure I understand the use case. Learner doesn't serve API, so demoting a member will break clients connected to it similar to taking the node down. Why just not remove/shutdown problematic member?

I expect the problem might be related to problematic member, reconnecting to cluster with higher term, forcing a unnecessary leader election and allowing faulty member to become leader.

Which etcd version you are running? Problem of re-connecting member repeatedly triggering leader election has been already solved by --pre-vote flag (default in v3.5).