Open DaveCTurner opened 1 year ago
Pinging @elastic/es-core-infra (Team:Core/Infra)
I explored the codebase and gave this some thoughts. I see two "easy" solutions (and one "medium") to this:
NodeClient.executeLocally
signature to accept an additional Executor parameter (as proposed in https://github.com/elastic/elasticsearch/issues/86765)
executeLocally
will be affected - execute/doExecute
will have to continue using SAME (is this a problem? Is execute
on NodeClient
called directly?)HandledTransportAction
) need to specify the executor twice: once when registering with TransportService
, another when calling executeLocally
. They can even differ (which may be bad or not)HandledTransportAction
one level in the hierarchy, moving it to TransportAction
. NodeClient
can then read the executor from the TransportAction
(via the map) and use it.
TransportService
and NodeClient
TransportAction
, and using a getter to read it. Not nice.ActionModule.setupActions
- which ends up building the Map<ActionType, TransportAction>
passed to NodeClient
and 2- TransportService.registerRequestHandler
/Transport.RequestHandlers
via HandledTransportAction
), we do it in one place that will do both, require an executor to be specified, and build a Map for NodeClient that includes the desired executor ( a ActionType -> TransportAction, Executor
map)The additional con for these is that they are lightweight solution that do not address other issues (registration of actions, simpler local-only actions, unused serialization/deserialization code (https://github.com/elastic/elasticsearch/issues/100111), etc.)
I will continue exploring more complex options (change to the class hierarchy).
is this a problem? Is execute on NodeClient called directly?
I think so, at least AIUI changing only executeLocally
would not help the callers that use a NodeClient
within a wrapper such as OriginSettingClient
, ParentTaskAssigningClient
, or RestCancellableNodeClient
.
Instead of doing it in 2 places like today
I think the two places are two different things and we want to keep them distinct. Although many actions are both, there are several which can be invoked by a client but have no transport handler, and correspondingly there are many which have a transport handler but no client action.
That's not to say I disagree that actions should have a corresponding "local" executor, nor about having a utility for the reasonably common case of actions which are both client-facing and transport-facing I just don't think it should be mandatory for all actions to be both.
TBH I think option 2 will be simplest. It doesn't even need to be retrieved with a getter, we could do the forking within TransportAction#execute
(as long as the transport layer has some way to bypass that forking because it's already forked to the executor that it, in turn, retrieves with a getter).
Although many actions are both, there are several which can be invoked by a client but have no transport handler, and correspondingly there are many which have a transport handler but no client action.
Yes of course. I see there are actions that are registered to TransportService outside HandledTransportAction
. I was under the impression that all HandledTransportAction
were registered to be invoked by a client too, or that there was at least a good overlap, but I stand corrected.
In any case, I was thinking about 1 place with 2 different registrations - we still have things that go into NodeClient, things that go into Transport, but they are both declared in one place (so we don't need specify things like an executor twice).
Something like
ActionRevistration.builder(ClusterStateAction.INSTANCE, TransportClusterStateAction.class) // Or a factory
.withExecutor(threadPool.executor(ThreadPool.Names.MANAGEMENT) // defaults to DIRECT_EXECUTOR_SERVICE
.withNodeClient()
.withTrasportHandler(<<params>>)
.build();
Based on withNodeClient, withTransportHandler (or both), register
then decides what needs to go into NodeClient and what goes into Trasport.
But yeah, of the 3 option 2 looks like the best compromise. I will give it a go, for fun, between a PR and another of my "regular" work :)
An update on this issue:
We have a clearer idea of how we want to definitely solve this (see discussion in https://github.com/elastic/elasticsearch/pull/100895); TL;DR we want to simplify TransportAction creation and registration, reducing it to one place (vs the 2 we have today); handling/passing the correct executor will be part of the registration process. This will be done with a new DI framework (properly tailored to our needs) that we are currently working on as part of our effort to reduce technical debt.
However, this but it's still some months away meanwhile, we had at least one incident which had this as root cause again, and we have an action item to mitigate the risk of this happening again. after syncing with the team, we agreed to re-open (and merge) https://github.com/elastic/elasticsearch/pull/100895, as a temporary stopgap measure: we still want to solve this properly, in a way that will likely undo this change, but we want to be more protected meanwhile.
So even if https://github.com/elastic/elasticsearch/pull/100895 is now merged, I'm leaning towards keeping this open, so we don't forget we need to come back and fix this "properly".
We use the
NodeClient
to execute transport actions on the local node, most notably throughout the REST layer and in plugins that do not have direct access to theTransportService
. Each transport action declares an executor when registering itself with the transport service, with the expectation that it will begin execution on that executor rather than on the calling thread. In fact this forking only happens when invoked via theTransportService
and not via theNodeClient
, which ignores the executor and always begins the execution of the transport action on the calling thread. In the REST layer that's always going to be the transport worker handling the request. This is pretty bad because transport actions may immediately do some work that's far too expensive to run on a transport worker, such as O(#shards) coordination.Relates https://github.com/elastic/elasticsearch/issues/97914 Relates https://github.com/elastic/elasticsearch/issues/92179 Relates https://github.com/elastic/elasticsearch/issues/100111