elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

Roll over data streams to help vacate shutting-down nodes #105118

Open DaveCTurner opened 9 months ago

DaveCTurner commented 9 months ago

Today when we mark a node for shutdown (with type remove, replace or sigterm) we move all the shards off the node using the peer recovery mechanism first. In particular, after starting the engine on the target node we must replay all missing operations to bring it in-sync, and this includes any operations received while replaying earlier operations. On shards with high indexing throughput this can take quite some time as we're chasing a fast-moving target, and operation replay is much less efficient than file-based recovery (https://github.com/elastic/elasticsearch/issues/68513).

For shards that belong to data streams we could avoid much of this operation-replay work by rolling over the data stream after marking the node for shutdown, redirecting all future write traffic to new shards which will be assigned to nodes other than the one which is shutting down.

Some further thoughts:

elasticsearchmachine commented 9 months ago

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine commented 9 months ago

Pinging @elastic/es-data-management (Team:Data Management)