elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.21k stars 24.85k forks source link

`NodeClient` executes transport actions without forking #97916

Open DaveCTurner opened 1 year ago

DaveCTurner commented 1 year ago

We use the NodeClient to execute transport actions on the local node, most notably throughout the REST layer and in plugins that do not have direct access to the TransportService. Each transport action declares an executor when registering itself with the transport service, with the expectation that it will begin execution on that executor rather than on the calling thread. In fact this forking only happens when invoked via the TransportService and not via the NodeClient, which ignores the executor and always begins the execution of the transport action on the calling thread. In the REST layer that's always going to be the transport worker handling the request. This is pretty bad because transport actions may immediately do some work that's far too expensive to run on a transport worker, such as O(#shards) coordination.

Relates https://github.com/elastic/elasticsearch/issues/97914 Relates https://github.com/elastic/elasticsearch/issues/92179 Relates https://github.com/elastic/elasticsearch/issues/100111

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

ldematte commented 1 year ago

I explored the codebase and gave this some thoughts. I see two "easy" solutions (and one "medium") to this:

  1. change NodeClient.executeLocally signature to accept an additional Executor parameter (as proposed in https://github.com/elastic/elasticsearch/issues/86765)
    • Pro: change should be long but straightforward
    • Con: only direct calls to executeLocally will be affected - execute/doExecute will have to continue using SAME (is this a problem? Is execute on NodeClient called directly?)
    • Con: (if HandledTransportAction) need to specify the executor twice: once when registering with TransportService, another when calling executeLocally. They can even differ (which may be bad or not)
  2. "pull down" the executor we currently pass to HandledTransportAction one level in the hierarchy, moving it to TransportAction. NodeClient can then read the executor from the TransportAction (via the map) and use it.
    • Pro: same executor (declared once) for TransportService and NodeClient
    • Pro: (if we allow a ctor with a default Executor) change would not be too bad
    • Con: "ugly": we are "parking" an executor inside TransportAction, and using a getter to read it. Not nice.
  3. change the registration process, for all actions. Instead of doing it in 2 places like today (1- ActionModule.setupActions - which ends up building the Map<ActionType, TransportAction> passed to NodeClient and 2- TransportService.registerRequestHandler/Transport.RequestHandlers via HandledTransportAction), we do it in one place that will do both, require an executor to be specified, and build a Map for NodeClient that includes the desired executor ( a ActionType -> TransportAction, Executor map)

The additional con for these is that they are lightweight solution that do not address other issues (registration of actions, simpler local-only actions, unused serialization/deserialization code (https://github.com/elastic/elasticsearch/issues/100111), etc.)

I will continue exploring more complex options (change to the class hierarchy).

DaveCTurner commented 1 year ago

is this a problem? Is execute on NodeClient called directly?

I think so, at least AIUI changing only executeLocally would not help the callers that use a NodeClient within a wrapper such as OriginSettingClient, ParentTaskAssigningClient, or RestCancellableNodeClient.

Instead of doing it in 2 places like today

I think the two places are two different things and we want to keep them distinct. Although many actions are both, there are several which can be invoked by a client but have no transport handler, and correspondingly there are many which have a transport handler but no client action.

That's not to say I disagree that actions should have a corresponding "local" executor, nor about having a utility for the reasonably common case of actions which are both client-facing and transport-facing I just don't think it should be mandatory for all actions to be both.

TBH I think option 2 will be simplest. It doesn't even need to be retrieved with a getter, we could do the forking within TransportAction#execute (as long as the transport layer has some way to bypass that forking because it's already forked to the executor that it, in turn, retrieves with a getter).

ldematte commented 1 year ago

Although many actions are both, there are several which can be invoked by a client but have no transport handler, and correspondingly there are many which have a transport handler but no client action.

Yes of course. I see there are actions that are registered to TransportService outside HandledTransportAction. I was under the impression that all HandledTransportAction were registered to be invoked by a client too, or that there was at least a good overlap, but I stand corrected. In any case, I was thinking about 1 place with 2 different registrations - we still have things that go into NodeClient, things that go into Transport, but they are both declared in one place (so we don't need specify things like an executor twice). Something like

ActionRevistration.builder(ClusterStateAction.INSTANCE, TransportClusterStateAction.class) // Or a factory
   .withExecutor(threadPool.executor(ThreadPool.Names.MANAGEMENT) // defaults to DIRECT_EXECUTOR_SERVICE
   .withNodeClient()
   .withTrasportHandler(<<params>>)
   .build();

Based on withNodeClient, withTransportHandler (or both), register then decides what needs to go into NodeClient and what goes into Trasport.

But yeah, of the 3 option 2 looks like the best compromise. I will give it a go, for fun, between a PR and another of my "regular" work :)

ldematte commented 3 months ago

An update on this issue:

So even if https://github.com/elastic/elasticsearch/pull/100895 is now merged, I'm leaning towards keeping this open, so we don't forget we need to come back and fix this "properly".