elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.58k stars 24.63k forks source link

Open connections to new nodes more lazily #108127

Open DaveCTurner opened 4 months ago

DaveCTurner commented 4 months ago

Today we block cluster state application while waiting to connect to newly-added nodes in a cluster before starting to apply the state:

https://github.com/elastic/elasticsearch/blob/90351ef63903c0ea5453d27b14575dbb2d07e6aa/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java#L509-L519

We do this because we expect to be able to send requests to every node in the cluster, and we don't want to report a failure if we attempt to send a request before the initial connection attempt has completed. However, we could achieve the same effect without this blocking wait by creating a placeholder connection which captures any requests destined for these new nodes and delays them until the initial connection attempt has completed (whether successfully or otherwise).

Such delays generally wouldn't apply to performance-critical requests like searches or indexing because the new node would initially have no shards assigned to it. One possible problem is that if the new node is an ingest node, and the cluster contains some nodes without the ingest role, then those nodes might try and forward ingest traffic to the new node using this delayed connection which would be visible as a blip in indexing latency. We'd probably want to make the routing logic for those requests be aware of the potential delay.

Relates https://github.com/elastic/elasticsearch/issues/89821 since that's another thing that delays cluster state application unnecessarily

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-distributed (Team:Distributed)