elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.63k stars 24.64k forks source link

Closing `IndexWriter` might wait a long time for merges to complete #107513

Open DaveCTurner opened 5 months ago

DaveCTurner commented 5 months ago

Another instance of https://github.com/elastic/elasticsearch/issues/89821: I saw a case where a cluster took 3+ minutes to apply a cluster state, and on closer inspection it was waiting on aborted merges to complete here:

 0.0% [cpu=0.0%, other=0.0%] (0s out of 500ms) cpu usage by thread 'elasticsearch[es-es-index-8d94bb688-8n6dg][clusterApplierService#updateTask][T#1]'
  10/10 snapshots sharing following 29 elements
    java.base@21.0.2/java.lang.Object.wait0(Native Method)
    java.base@21.0.2/java.lang.Object.wait(Object.java:366)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5500)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2738)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2493)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2462)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2454)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.engine.InternalEngine.closeNoLock(InternalEngine.java:2620)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.engine.Engine.close(Engine.java:1956)
...
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.shard.IndexShard.close(IndexShard.java:1718)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.IndexService.closeShard(IndexService.java:589)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.IndexService.removeShard(IndexService.java:569)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.IndexService.close(IndexService.java:365)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:914)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.removeIndicesAndShards(IndicesClusterStateService.java:409)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:247)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:540)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:526)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:499)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:430)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:155)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
    java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    java.base@21.0.2/java.lang.Thread.runWith(Thread.java:1596)
    java.base@21.0.2/java.lang.Thread.run(Thread.java:1583)

The associated merge thread was busy here:

 3.0% [cpu=3.0%, other=0.0%] (15ms out of 500ms) cpu usage by thread 'elasticsearch[es-es-index-8d94bb688-8n6dg][[.ds-traces-apm-default-2024.04.15-000064][0]: Lucene Merge Thread #21]'
  10/10 snapshots sharing following 30 elements
...
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:46)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.DataInput.readBytes(DataInput.java:73)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:619)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.checkIntegrity(Lucene90CompressingStoredFieldsReader.java:725)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.merge(Lucene90CompressingStoredFieldsWriter.java:609)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:234)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger$$Lambda/0x00000028029fb1f8.merge(Unknown Source)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:273)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
    app/org.elasticsearch.server@8.14.0/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)

AFAICT today there's no mechanism to abort the big read that checksumEntireFile requires when the corresponding merge is aborted. In this context it is plausible that this read might take a few minutes to complete (the data was coming in chunks from S3). I think we should have such a mechanism.

elasticsearchmachine commented 5 months ago

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner commented 4 months ago

I opened https://github.com/apache/lucene/issues/13354 because it looks like we'd need to change something in Lucene to address this.