Another instance of https://github.com/elastic/elasticsearch/issues/89821: I saw a case where a cluster took 3+ minutes to apply a cluster state, and on closer inspection it was waiting on aborted merges to complete here:
0.0% [cpu=0.0%, other=0.0%] (0s out of 500ms) cpu usage by thread 'elasticsearch[es-es-index-8d94bb688-8n6dg][clusterApplierService#updateTask][T#1]'
10/10 snapshots sharing following 29 elements
java.base@21.0.2/java.lang.Object.wait0(Native Method)
java.base@21.0.2/java.lang.Object.wait(Object.java:366)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5500)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2738)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2493)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2462)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2454)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.engine.InternalEngine.closeNoLock(InternalEngine.java:2620)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.engine.Engine.close(Engine.java:1956)
...
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.shard.IndexShard.close(IndexShard.java:1718)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.IndexService.closeShard(IndexService.java:589)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.IndexService.removeShard(IndexService.java:569)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.IndexService.close(IndexService.java:365)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:914)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.removeIndicesAndShards(IndicesClusterStateService.java:409)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:247)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:540)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:526)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:499)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:430)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:155)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.base@21.0.2/java.lang.Thread.runWith(Thread.java:1596)
java.base@21.0.2/java.lang.Thread.run(Thread.java:1583)
The associated merge thread was busy here:
3.0% [cpu=3.0%, other=0.0%] (15ms out of 500ms) cpu usage by thread 'elasticsearch[es-es-index-8d94bb688-8n6dg][[.ds-traces-apm-default-2024.04.15-000064][0]: Lucene Merge Thread #21]'
10/10 snapshots sharing following 30 elements
...
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:46)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.DataInput.readBytes(DataInput.java:73)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:619)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.checkIntegrity(Lucene90CompressingStoredFieldsReader.java:725)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.merge(Lucene90CompressingStoredFieldsWriter.java:609)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:234)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger$$Lambda/0x00000028029fb1f8.merge(Unknown Source)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:273)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
AFAICT today there's no mechanism to abort the big read that checksumEntireFile requires when the corresponding merge is aborted. In this context it is plausible that this read might take a few minutes to complete (the data was coming in chunks from S3). I think we should have such a mechanism.
Another instance of https://github.com/elastic/elasticsearch/issues/89821: I saw a case where a cluster took 3+ minutes to apply a cluster state, and on closer inspection it was waiting on aborted merges to complete here:
The associated merge thread was busy here:
AFAICT today there's no mechanism to abort the big read that
checksumEntireFile
requires when the corresponding merge is aborted. In this context it is plausible that this read might take a few minutes to complete (the data was coming in chunks from S3). I think we should have such a mechanism.