elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.6k stars 24.63k forks source link

Reduce usage of `TransportMasterNodeReadAction` #101805

Open DaveCTurner opened 10 months ago

DaveCTurner commented 10 months ago

TransportMasterNodeReadAction is intended for cases where we need to collect some state held only on the elected master, for instance related to shard allocation or data stream errors. However, many TransportMasterNodeReadAction implementations work as a pure function of the cluster state, which is held on every node, so there is no need to route these requests to the master for processing. Moreover, some of these requests may be quite expensive to process in large clusters, so routing them all to the master represents a scalability bottleneck. We should reconsider each usage of TransportMasterNodeReadAction and decide whether it needs to run on the master or not. If not, we should convert them to regular local-only transport actions (e.g. using TransportLocalClusterStateAction).

Additionally, many of these actions are not currently cancellable, but they (or at least the expensive ones) should be. Experience shows that we're not great at spotting the expensive ones ahead of time, so IMO we should err on the side of caution and make each one cancellable unless we have a good reason for not doing so.

Note that attempting to route these requests to the current master does not give them any stronger consistency guarantees, because the node that does the work does not validate that it is the master before responding. It's possible that a new master has been elected, and the cluster state updated, without the responding node knowing about it.

Relates #77466

elasticsearchmachine commented 10 months ago

Pinging @elastic/es-distributed (Team:Distributed)

idegtiarenko commented 10 months ago

I believe TransportGetDesiredBalanceAction is required to run on elected as it response contains DesiredBalanceStats that are computed in the allocator during execution. I think the same is true regarding ClusterInfo. Alternatively we can update the action to run anywhere in the cluster and read only the stats and clusterInfo from elected master.

DaveCTurner commented 10 months ago

I believe TransportGetDesiredBalanceAction is required to run on elected

Me too - please feel free to cross it off the list in the OP with a comment to that effect.