Assign priorities to search requests

jpountz commented 5 years ago

There has been a recurring ask to be able to prioritize some queries over other queries. One use-case for this feature is the ability to run user queries and system queries on the same cluster. Some ideas have been suggested like having separate threadpools or using priority queues rather than regular queues. Regardless of priority, search requests are competing for the same resources in the end, so given that we don't have (and probably don't want) the ability to pause running queries, low-priority queries would likely always have the potential to slow down higher-priority queries.

This doesn't mean that we shouldn't do anything, hence this new issue where we would like to collect feedback about use-cases to see what we could do. Please share your use-case, we are especially interested to know:

How would you decide on the priority of a request?
How many priorities you would need? (only 2 to handle system vs user queries or more)
What trade-offs would you be happy with? For instance would it be ok if low-priority queries were rate-limited even in the absence of higher-priority queries?

Some related issues include #37856 (don't let queries that target lots of shards slow down more reasonable queries) and #14224 (priorities, but at index-time).

elasticmachine commented 5 years ago

Pinging @elastic/es-search

nitin2goyal commented 5 years ago

Cluster We are having cluster with time sharded indices. 1 index has roughly 16 hours of data and 8 shards.

Use case Our use case is that we can have a query on any time range in past year and we have seen that in most cases, breadth of time range is proportional to query times, except in few cases where query size is huge and we query on few(2-5) indices. We know beforehand the approximate cost of execution of query and we somehow want to punish costly queries but only when we have less expensive queries also coming to our cluster. Something like following -

If a single costly query is running, give it all the resources
If a single inexpensive query is running, give it all the resources.
If a costly query is running and inexpensive query(s) come, divide resources (at application level) at configured level.

Query cost can be part of query itself (passed by user).

nitin2goyal commented 5 years ago

Another use case (something similar but different) is that we have a UI dashboard with 8 widgets. When the UI is loaded, those widgets fire concurrent queries to our backend (ES) and those queries share same filters but have different aggregations. But since those are being fired in parallel, we are recomputing filter in all those 8 requests. We can consider this here to cache filter results and queue same filter queries so that we don't re-compute them.

nitin2goyal commented 5 years ago

@jpountz

jpountz commented 5 years ago

Thanks for sharing @nitin2goyal. Use-case 1 makes sense. Use-case 2 is a bit more tricky because we'd need to make Elasticsearch aware somehow that it it going to get N times the same query with different aggregations. In my opinion this kind of problem would be better solved on the client-side by having eg. two requests that have 4 aggregations each in order to decrease the number of times the filter needs to be evaluated.

nitin2goyal commented 5 years ago

Other use cases :

One of our cluster is multi-tenant. That means, a group of indices per tenant. So we want a mechanism to throttle tenant(s) in case of concurrent requests from other(s) tenant. Basically, one user/tenant shouldn't make cluster un-responsive to other tenants.
This is related to above use case and the initial use cases I wrote above. We want to throttle a single user as well. I know it makes more sense to throttle at client side, but in some cases client handling is limited.

wux5 commented 5 years ago

We have a use case that runs queries against Elasticsearch for real time search, and user has the option to export the search result to external file which can be large amount of data with many pages. It'll be great to run the queries that are for exporting data at lower priority, so the real time search is more responsive, and we can bear with slowness of exporting data.

tsg commented 4 years ago

As we're creating more background searches in the Security app, especially EQL queries which can take a longer time, this could be interesting in the sense that prioritizing the interactive queries can keep the UI feel snappy for longer.

+1 to the feature request.

hendrikmuhs commented 4 years ago

Transform allows long running jobs that execute searches. This can have a negative effect on normal usage. Therefore assigning a low priority to searches would be nice to have.

We mitigated this issue by adding throttling support, see #54862 and we will also make searches less resource-hungry, however this feels like a workaround and has e.g. the bad side effect that we are not using the available capacity, e.g. transform should run at full speed if possible, for example at off-hours.

Long story short, +1 for this feature request, so we can create a better alternative to throttling.

jimczi commented 4 years ago

We discussed internally and agreed that we should split this issue.

Different thread pool for system indices

Priorities for system queries become more important if we allow slow queries more aggressively (script query, source-only field) since they can be blocked by slow user queries. A query made on the .kibana or worst the .security index shouldn't have to wait for user queries to free up the thread pool before they get executed. We already have a separate thread pool for frozen indices but the idea here would be different. System queries should be simple and fast (they don't run on a lot of data) so a separate thread pool with a single thread would ensure that they remain responsive even when the search thread pool is busy with slow queries. We can discuss how to expose this new thread pool to system indices or if it should be reserved to a static list but the priority of this item has been raised to cope with the fact that Kibana will allow user queries to run in the background. Running multiple queries in the background shouldn't block Kibana or Security to run the administrative queries that are needed to load a dashboard, a visualization or simply authenticate a user in Kibana.

Handle priorities among user queries.

With async search coming in the next release, we want to better handle queries that take time. In the current situation, slow queries share the same thread pool than fast queries so they're not only competing for disk and cpus but also to get threads from the pool. I can see two types of slow queries:

Queries that target a lots of shards.
Queries that are slow to execute on a single shard.

A single thread pool with different priorities should handle the first case well. The second case is more tricky since all active threads could be blocked with slow shard queries that take 1h individually. So if a new request come, it'd need to wait at most 1h to get a thread no matter what priority was set on it. That's a good argument to have a different thread pool here too but we also don't want to limit the resource of low-priority queries if there are no fast queries to run. We also don't want to multiply the number of thread pool (and threads in general) so this needs more thinking.

Separating the two use cases should help to make progress. I am planning to open a separate issue for a new thread pool for system indices and hope we can continue to discuss user queries priorities here.

jimczi commented 4 years ago

We had a chat with @gwbrown about system indices. The plan is to use a separate thread pool for these indices as described in the [meta issue].(https://github.com/elastic/elasticsearch/issues/50251). This is also the next item on the plan so we'll move the discussion to the meta issue and describe the proposal there.

mayya-sharipova commented 4 years ago

A single thread pool with different priorities should handle the first case well.

Giving different priorities to async search requests vs interactive/real-time search requests looks a good idea to me. We can have a user-configured parameter for the ratio of these priorities, for example 0.05 could mean that for every 5 queued async requests, 100 queued interactive requests will be executed.
Or may be we divide overall requests on slow and fast based on precomputed costs (regardless if they are async or real-time).

Queries that are slow to execute on a single shard. ...We also don't want to multiply the number of thread pool

Looks like having a separate thread pool for system indices makes this problem less important, and indeed may not worth to have another thread pool for slow requests.

joshdevins commented 3 years ago

I'd add that things like Rank Evaluation API calls, which internally make a load of parallel queries, should also have low priority. I don't know if there is a class of "internal but non system index" queries that would fall into this category as well.

adithya1991 commented 3 years ago

Could this also pave the way for prioritising reads over writes?

Use case: Constant stream of writes but we have an SLA of 1-2 Days for the data to be actually reflected. So while these writes happen, we do not want the reads to suffer.

cyberfifi commented 3 years ago

How would you decide on the priority of a request?

We would consider requests from client or user generated traffic as high priority tasks. Requests from batch offlines jobs are lower priority tasks.

How many priorities you would need? (only 2 to handle system vs user queries or more)

At least two. But less than 10.

GlenRSmith commented 3 years ago

Dealing with a use case that is analogous to the one mentioned: There are searches against the cluster from an application, these are pretty surgical and by themselves very low latency, and then there are searches conducted on an ad hoc basis by users who have a UI, and these searches have the opportunity to be more complex or request a large result set.

So from time to time, the latency that the application sees for its searches has a spike.

(By the way, adaptive replica selection doesn't help with this, as it has no way to account for the weight of queries that are already in each node's search queue, among other things.)

How would you decide on the priority of a request?

In this case, it could be based on the user associated with the request. Or maybe it would be a request param. That would be easy enough to control where all of the actual requests are generated "under wraps", abstracted from any end users.

How many priorities you would need? (only 2 to handle system vs user queries or more)

2 priorities for this scenario.

What trade-offs would you be happy with? For instance would it be ok if low-priority queries were rate-limited even in the absence of higher-priority queries?

That trade-off seems reasonable. In this scenario, as the searches desired to be made low-priority are already slow, such rate-limiting probably won't change the user experience a great deal.

For what it's worth, in trying to devise a way to accomplish this with currently available features, I've necessarily been looking at the cluster level instead of the node level, like using allocation awareness plus search shard preference to designate specific sets of nodes to service specific searches. That approach would likely result in inefficiency in the form of idle time in one set of nodes or the other, but it is probably an improvement over deploying a second cluster with duplicate data.

EDIT: clarified language

ki-fraam commented 2 years ago

Use case same as above: standard simple low latency queries vs heavy long-running resource-intensive analytical queries.

How would you decide on the priority of a request?

provided as a parameter (?requestprio=) using scripting language expressions on request contents and metadata

How many priorities you would need? (only 2 to handle system vs user queries or more)

At least two for user queries (fast vs slow).

massivespace commented 1 year ago

Our use case follows:

Single index per customer
- Some customers huge (30% of the data) while others are tiny (less than .01%)
Customer users that can query only their data, which needs high priority
Ingestion software runs multiple small queries at time of each ingestion, which needs high priority
Internal engineers that query the entire database, including complex aggregations, which needs low priority
- This has historically caused timeouts for the parallel customer queries
Data is fed in with peaks and troughs throughout the day, via a queue to spread the data out

Optimally, I would like:

Customizable priority at the query level (we would reduce priority for internal engineers and maybe for large customers)
- Async-only prioritization of requests would be okay but sub-optimal
Option to prioritize reads over writes (we prefer increased ingestion latency since the queue will just take longer to drain)
Priority settings for system indices (we have run into issues several times where a node dropout and the subsequent shard movement caused both query/ingestion to suffer greatly [were using v6 at the time])
Priority for reindexing or that reindexing can still get high priority without throttling

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elastic / elasticsearch

Assign priorities to search requests #37867

Different thread pool for system indices

Handle priorities among user queries.