Today when we mark a node for shutdown (with type remove, replace or sigterm) we move all the shards off the node using the peer recovery mechanism first. In particular, after starting the engine on the target node we must replay all missing operations to bring it in-sync, and this includes any operations received while replaying earlier operations. On shards with high indexing throughput this can take quite some time as we're chasing a fast-moving target, and operation replay is much less efficient than file-based recovery (https://github.com/elastic/elasticsearch/issues/68513).
For shards that belong to data streams we could avoid much of this operation-replay work by rolling over the data stream after marking the node for shutdown, redirecting all future write traffic to new shards which will be assigned to nodes other than the one which is shutting down.
Some further thoughts:
maybe delay the start of the recovery until the rollover completes?
maybe only for data streams with some minimum estimated write load?
maybe recovery should flush before starting to recover a shard in a data stream which doesn't belong to the write index, to establish a safe commit that has (almost) all operations?
Today when we mark a node for shutdown (with type
remove
,replace
orsigterm
) we move all the shards off the node using the peer recovery mechanism first. In particular, after starting the engine on the target node we must replay all missing operations to bring it in-sync, and this includes any operations received while replaying earlier operations. On shards with high indexing throughput this can take quite some time as we're chasing a fast-moving target, and operation replay is much less efficient than file-based recovery (https://github.com/elastic/elasticsearch/issues/68513).For shards that belong to data streams we could avoid much of this operation-replay work by rolling over the data stream after marking the node for shutdown, redirecting all future write traffic to new shards which will be assigned to nodes other than the one which is shutting down.
Some further thoughts: