elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.59k stars 24.63k forks source link

`IndicesRequestCache` uncancellably blocks search threads while result is pending #108703

Open DaveCTurner opened 4 months ago

DaveCTurner commented 4 months ago

A user reported to me that they had inadvertently run a very expensive collection of queries which caused stress to their cluster so they cancelled them, but some indices:data/read/search[phase/query] tasks continued to run for a very long time after being cancelled and eventually they had to restart nodes to restore their cluster back to a working state. They shared a thread dump which shows various places where we appear to be missing cancellation detection today, see https://github.com/elastic/elasticsearch/issues/108701, but also in the thread dump I noticed that there were quite a few search threads blocking within IndicesRequestCache.getOrCompute apparently waiting for the result of the query that the other threads are computing.

I think we should avoid filling up the search pool with these blocking tasks so that these threads can do other more meaningful work, but at the very least we should also make these cache interactions react to cancellations properly.

    0.0% [cpu=0.0%, other=0.0%] (0s out of 500ms) cpu usage by thread 'elasticsearch[REDACTED][search][T#19]'
     10/10 snapshots sharing following 28 elements
       java.base@21.0.1/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@21.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
       java.base@21.0.1/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1864)
       java.base@21.0.1/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
       java.base@21.0.1/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
       java.base@21.0.1/java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1898)
       java.base@21.0.1/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2072)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.cache.Cache$CacheSegment.get(Cache.java:205)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.cache.Cache.get(Cache.java:350)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:376)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:120)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1637)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1559)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:516)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:543)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.SearchService$$Lambda/0x00007f8aa18338c8.get(Unknown Source)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:51)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:48)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:73)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@21.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.1/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.1/java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)