The query phase fans out to all shards, sending as many shard level requests as the number of shards involved to the relevant data nodes. Years ago we have reworked the can match phase (as part of the many shards effort) to group shard level requests per data node, in order to decrease the number of roundtrips required (including authorization) and the overhead at the transport level. We would like to do the same for the query phase. We want to start small and scope this to only query phase (no DFS or query after DFS, no scroll), and only when there are aggs in the provided search request. That is because this is the type of requests that go through potentially many shards, and use quite a bit of memory on the coordinating node.
We expect that changing the execution model will provide better stability ,as well as better resource usage. In fact, currently the coordinating node throttles to 5 (configurable) concurrent shard requests per data node. If we group shard level requests to a single request per data node, each data node is going to be able to have more context about the portion of the search request it is requested to execute, and may execute its shard level requests at its own pace, depending on current load etc. We have seen that the current throttling mechanism can be a bottleneck, that prevents maximizing resource usage on data node. At the same time, this improvement would drastically reduce the network roundtrips from being a factor of the number of shards for the query phase, to a factor of the number of data nodes involved in the search request.
The query phase fans out to all shards, sending as many shard level requests as the number of shards involved to the relevant data nodes. Years ago we have reworked the can match phase (as part of the many shards effort) to group shard level requests per data node, in order to decrease the number of roundtrips required (including authorization) and the overhead at the transport level. We would like to do the same for the query phase. We want to start small and scope this to only query phase (no DFS or query after DFS, no scroll), and only when there are aggs in the provided search request. That is because this is the type of requests that go through potentially many shards, and use quite a bit of memory on the coordinating node.
We expect that changing the execution model will provide better stability ,as well as better resource usage. In fact, currently the coordinating node throttles to 5 (configurable) concurrent shard requests per data node. If we group shard level requests to a single request per data node, each data node is going to be able to have more context about the portion of the search request it is requested to execute, and may execute its shard level requests at its own pace, depending on current load etc. We have seen that the current throttling mechanism can be a bottleneck, that prevents maximizing resource usage on data node. At the same time, this improvement would drastically reduce the network roundtrips from being a factor of the number of shards for the query phase, to a factor of the number of data nodes involved in the search request.