elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.01k stars 24.5k forks source link

Eventually-quiescent election scheduling #98467

Open DaveCTurner opened 1 year ago

DaveCTurner commented 1 year ago

Although it is provably[^1] impossible to ensure any particular master election attempt succeeds, Elasticsearch's election protocol is eventually quiescent which means that we can almost-surely guarantee that the cluster eventually elects a master by extending the time between election attempts until the gap is long enough for a single attempt to run to completion without interference from other nodes.

Today we do not back off the election scheduler in all the situations needed to truly guarantee the eventual completion of an election:

Today's implementation works very well in practice for almost all clusters, but may not be effective in clusters with particularly poor IO performance, especially if the cluster is very large or otherwise unusually configured. I'm opening this issue to track that there's still some gaps in this area.

[^1]: Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985), 374–382.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-distributed (Team:Distributed)