apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

Simplify task executor for concurrent operations #12498

Closed javanna closed 1 year ago

javanna commented 1 year ago

IndexSearcher supports parallelizing search across slices when an executor is available. Knn queries can also parallelize their rewrite across segments using the same executor. Potentially, other operations will be parallelized in the future using the same pattern.

Lucene currently has the notion of a TaskExecutor (previously named SliceExecutor, but because knn rewrite is parallelized across segments rather than slices, it was recently renamed to TaskExecutor) which is responsible for offloading tasks to the executor, wait for them all to complete and return the corresponding results.

IndexSearcher currently has an instanceof check of the provided executor, and if it's a ThreadPoolExecutor, which is likely, a QueueSizeBasedExecutor is used which tries to be smart about when to offload tasks to the executor vs when to execute them on the caller thread. This behaviour is not configurable in any way.

As part of exposing concurrent search in Elasticsearch, we have found that the current task executor is too opinionated, and either it needs to be customizable, or simplified to be less opinionated.

In a server-side search engine, the search itself is likely executed by a thread pool that has its own sizing, queueing mechanisms as well as rejection policy. When offloading requests to an executor for additional parallelism, it is important to be able to determine where the load is going to be and what type of workload it maps to. Ideally, the caller thread is merely coordinating and not doing any I/O nor CPU intensive work, that is instead all delegated to the separate worker threads. Having built-in rules for when to execute certain operations on the caller thread may cause more problems than it solves, as it is unpredictable and makes sizing of thread pools more difficult, because all of a sudden you end up with two thread pools that may execute I/O as well as CPU intensive operations.

My conclusion is that if flexibility is needed in terms of possibly executing on the caller thread, such behaviour can be included in the executor that is provided to the searcher (for example with an adaptive mechanism that conditionally executes directly instead of offloading based on queue size like QueueSizeBasedExecutor does), together with its saturation policy (as opposed to catching RejectedExecutionException and executing on the caller thread, which is potentially dangerous). Also, executing the last slice / task on the caller thread, as it's the one waiting for all the tasks to complete, is not necessarily addressing a real problem around under-utilization of a thread that is doing nothing. That wait is cheap, and it's possibly more important to divide the workload among the thread pools. That said, such behavior can also be included in the Executor itself and does not require additional extension points.

My proposal is that we remove the QueueSizeBasedExecutor entirely and we simply offload every task to the executor, unconditionally. It's up to the provided executor to determine what to do when execute is called. That is the pluggable behaviour. Lucene should not have strong opinions nor provide building blocks for how to execute concurrent tasks.

Additionally, I think that we should unconditionally offload execution to the executor when available, even when we have a single slice. It may seem counter intuitive but it's again to be able to determine what type of workload each thread pool performs.

Relates to #12438

javanna commented 1 year ago

@jpountz @mikemccand pinging you two because you are likely to have thoughts on this.

jpountz commented 1 year ago

It makes sense to me to push the responsibility of figuring out how to execute tasks to the executor. Also pinging @reta.

reta commented 1 year ago

It makes sense to me to push the responsibility of figuring out how to execute tasks to the executor. Also pinging @reta.

Thanks @jpountz , I second that

Additionally, I think that we should unconditionally offload execution to the executor when available, even when we have a single slice. It may seem counter intuitive but it's again to be able to determine what type of workload each thread pool performs.

That's is one of the difficulties we are dealing as well, specifically the exception branching logic has to account for wrapped / unwrapped exceptions.

javanna commented 1 year ago

Thanks for the feedback everyone, there's a PR up as a first step (does not include single slice offloading yet): https://github.com/apache/lucene/pull/12499 . Reviews are welcome.

Jeevananthan-23 commented 1 year ago

Hi @javanna I opened #12531 for async task processing, It could be great to use VirtualThreads(JDK21) for IndexSearch which able concurrent tasks execution similar to Executor to run in ThreadPool Carrier Thread rather it runs in lightweight VirtualThreads.

uschindler commented 1 year ago

You can use virtual threads out of box for IndexSearcher. Just pass a suitable executor: Executors.newVirtualThreadPerTaskExecutor()