Concurrent Searching - Githubissues

LifeIsStrange commented 2 years ago

It looks like Elasticsearch is currently missing support for concurrent searching and that this is a major prospective optimization to reduce latency.

Digression: this is not the topic at hand but I personally consider the fork OpenSearch to be an historical tragedy of duplication of work and a segregation of improvements. What would be even more anti-utilitarist would be to blindly ignore each other advances towards making better search.

Lucene concurrent search APIs are probably among the most important feature/optimization currently not leveraged by Elasticsearch. The fork OpenSearch has a pull request implementing support for it.

See also the corresponding issue

I believe it would be great for end users if you could take inspiration from the PR and work on this feature.

Great blog about the Lucene feature: https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html?m=1 See also those performance results: https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

mayya-sharipova commented 2 years ago

This has been discussed multiple times before, and here is a good summary why have not done it so far.

In short, we focus parallelization on processing multiple requests concurrently thus increasing throughput, rather than parallelizing a single request.

@LifeIsStrange I am curios what is your use case where you need to improve the latency of a single request?

LifeIsStrange commented 2 years ago

I am not very qualified to defend the optimization but maybe that @mikemccand could do some justification? 1) Improving the latency of a single very complex request seems like a real use case for niche advanced uses (maybe some kinds of scientific computing?). Actually it's not just for a single queries, it should benefit any project that has complex queries but in not very high number (and most websites in fact are likely to have a relatively low number of search queries but can have arbitrary complexity) 2) inter query and intra query parallelism seems to me like a complementary problem and having parallelism at both level shall in theory yield better latency for common workloads. 3) sure intra query parallelism might in some case adds overhead but 1) it would probably be an opt-in optimization and can be only applied in a fine-grained manner (I don't know if ElasticSearch/lucene has a performance cost model (a popular feature of e.g compilers where they have 1) a time/resource budget for optimizations, and 2) estimate the time/complexity of something), for ELK, if complexity of a query could be estimated (note that it has not to be only an a priori estimate, it can be refined empirically and dynamically through measurements like PGO) then if complexity score of a query has a certain pattern or is above a provided or determined threshold, then and eventually only then, leverage concurrent queries (intra query parallelism). You could even attempt speculative optimizations (trying to make a query concurrent, if it does not yield an improvement, fallback to single thread or adjust number of threads), like a JIT. 4) for completeness sake, an option to disable inter query parallelism would be nice or to be able to give precedence of one over the other depending on things such as latency vs throughput user preference?

Note btw that the cost of parallelism can be much lower than Java threads, either by leveraging a minimal amount of Kotlin code (using Coroutines) or through the soon to come Loom green (user space) threads. Off-topic: I really think ELK should consider leveraging the Vector API.

Edit: So I have read the original justification and:

We have found that combining concurrent requests with parallel segment readers does not offer any performance benefits and actually can hurt performance because of the cost of context switching so to keep the model simple we prefer to concentrate on keeping each search request processed on a single thread but support processing multiple search requests concurrently.

Firstly this was tested in 2018, Lucene releases frequently optimize/lower the cost of intra parallel queries. Secondly what has been tested (probably) was to use both kinds of parallelism to their maximum, instead testing the various optimizations I propose (only apply intra concurrency based on a query cost complexity threshold and budget of available resources, refining the cost heuristic value empirically dynamically through measurements and doing speculative optimization or at the very least automatically, shrinking the number of thread or removing all intra parallelism for a given query if and only if it doesn't provide a speedup). With those optimizations, the resulting result might yield significant performance improvements. Moreover, in any case I believe the feature should be at least exposed to the end user as an optin as I am pretty sure there are use cases where it leads to performance improvements (maybe with both kinds of parallelism of by disabling inter query parallelism). As a reminder, see this Yelps benchmark BTW the current WIP prototype (which do not implement the additional optimizations I mention) shows promising results.

LifeIsStrange commented 1 year ago

@javanna I see it has been privately discussed in march, any feedback about what was said?

BTW as a little update to my original comment, 1) the opensearch implementation has been merged 2) Java Virtual threads are here, since the stable Java 19 release (as a feature preview, AKA contrary to an incubator JEP, a preview should be expected to have a mostly stable API. Therefore, (although I would prefer Kotlin) you can already tests virtual threads as a way to diminish the cost of simultaneous inter and intra parallelism. Note that the existing inter parallelism could still use regular threads. But as a reminder, as explained in the original comment, even the regular threads cost is a non-issue, in the case you would use a time/complexity dynamic allocation budget, with automatic shrinkage and optionally speculative threading (growable) and ideally empiric real time metrics feedback (such as total cpu use vs througput/latency evolution). Then one could even implement a cache for which optimal configuration to perform for a given query. Let's say you have a recurrent complex query and that after said monitoring, it has been determined that using X (virtual or not) threads for intra parallelism and Y for inter. and that this configuration is the fastests found for said query (e.g. 50% faster than inter para alone), then the bonus idea would be to cache this configuration (key: query Z UUID, value: optimal configuration for said query, or hints). Then (if cache not invalidated/expired), in the next hour, the same query is sent, and we can directly use the optimal configuration rather than wait again for the JIT metrics to find back the optimal conf. Note this optimization is less important than the other mentionned.

javanna commented 1 year ago

@LifeIsStrange not much to add, this is quite high in the list of things we want to do. The list is long :) We'd prefer not to compromise and apply concurrency only to the query, but also to aggregations.

elastic / elasticsearch

Concurrent Searching #80693