elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.83k forks source link

Limit launch new MergeThread count to execute pending merges with max_merge_count #107650

Open cfangpp opened 6 months ago

cfangpp commented 6 months ago

Description

Lucene just according max_thread_count to pause MergeThread through IO throttling,when pending a lot of merges, ConcurrentMergeScheduler will launched corresponding quantity MergeThread and start running, then through IO throttling to pause the largest MergeThead if it's thread idx more than max_thread_count。

So if Lucene IndexWriter pending may merges, ElasticsearchConcurrentMergeScheduler will activate IndexThrottle when inflight merges more than max_merge_count, request that this engine throttle incoming indexing requests to one thread, A large number of index requests maybe be rejected due to this。

I think many MergeThread be paused by Lucene's IO throttling, might as well stop it to launching new MergeThread. can I used ElasticsearchConcurrentMergeScheduler.maybeStall method to limit to launch and run new MergeThread by max_thread_count or max_merge_count limited?

elasticsearchmachine commented 6 months ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 6 months ago

IO throttling is an interesting problem, in fact, we are working on adjusting the logic there substantially in Lucene: https://github.com/apache/lucene/pull/13293

On modern NVMe drives, the way throttling works now doesn't make sense.

Have you tried turning off auto-throttling all together to see if your performance improves? index.merge.scheduler.auto_throttle setting this to false.

cfangpp commented 6 months ago

Thanks for you reply.

My cluster run on mechanical hard drive, 12 drives per node, i think io throttle was very useful for this. but elasticsearch's IndexThrottle is limit indexing when launched MergeThread count more than max_merge_count on below code: "EngineMergeScheduler.java"

        public synchronized void beforeMerge(OnGoingMerge merge) {
            int maxNumMerges = mergeScheduler.getMaxMergeCount();
            if (numMergesInFlight.incrementAndGet() > maxNumMerges) {
                if (isThrottling.getAndSet(true) == false) {
                    logger.info("now throttling indexing: numMergesInFlight={}, maxNumMerges={}", numMergesInFlight, maxNumMerges);
                    activateThrottling();
                }
            }
        }

I just used "ElasticsearchConcurrentMergeScheduler.maybeStall" to limit to launch MergeThread to avoid IndexThrottle activate, and it was reduced a most of rejections for indexing

@benwtrent

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-distributed (Team:Distributed)