This may be related to https://github.com/elasticsearch/elasticsearch/issues/9018 . Pending_task output is not available for this incident and the end user has already restored from a snapshot to recover from the issue. But here are the sequence of events leading to the stuck shard in initializing state:
Original state:
The node ran oom when there were outstanding indexing and merge tasks:
[2014-12-23 19:07:36,687][INFO ][index.engine.internal ] [ESD_dev_i-438764ae] [element_v1][0] stop throttling indexing: numMergesInFlight=3, maxNumMerges=4
[2014-12-23 19:09:01,196][INFO ][index.engine.internal ] [ESD_dev_i-438764ae] [element_v1][9] stop throttling indexing: numMergesInFlight=3, maxNumMerges=4
[2014-12-23 19:13:27,845][INFO ][cluster.service ] [ESD_dev_i-438764ae] removed {[SSA_dev_i-e38bf50f][fhi0ftB7TtSB2KUVNZYu7A][SSA_dev_i-e38bf50f][inet[/IP:9300]]{client=true, data=false},}, reason: zen-disco-receive(from master [[ESM_dev_i-febd3600][35xCsRzNTDmtqvcTJivZ1Q][ESM_dev_i-febd3600][inet[/IP:9300]]{aws_availability_zone=us-east-1d, data=false, master=true}])
[2014-12-23 19:15:40,285][INFO ][cluster.service ] [ESD_dev_i-438764ae] added {[SSA_dev_i-e38bf50f][UzB7aWj1QSiaF6IUeKEXwg][SSA_dev_i-e38bf50f][inet[/IP:9300]]{client=true, data=false},}, reason: zen-disco-receive(from master [[ESM_dev_i-febd3600][35xCsRzNTDmtqvcTJivZ1Q][ESM_dev_i-febd3600][inet[/IP:9300]]{aws_availability_zone=us-east-1d, data=false, master=true}])
[2014-12-23 19:25:41,147][ERROR][index.engine.internal ] [ESD_dev_i-438764ae] [element_v1][0] Exception while waiting for merges asynchronously after optimize
org.elasticsearch.index.engine.FlushFailedEngineException: [element_v1][0] Flush failed
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:901)
at org.elasticsearch.index.engine.internal.InternalEngine.waitForMerges(InternalEngine.java:1024)
at org.elasticsearch.index.engine.internal.InternalEngine.access$200(InternalEngine.java:95)
at org.elasticsearch.index.engine.internal.InternalEngine$2.run(InternalEngine.java:1072)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3063)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:891)
... 6 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:260)
at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:213)
at org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:176)
at org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:586)
at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:248)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:133)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4173)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3768)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
[2014-12-23 19:25:41,340][DEBUG][action.admin.cluster.node.stats] [ESD_dev_i-438764ae] failed to execute on node [x9APfIW6Rii1WNPSElU0Pg]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1193)
at org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:170)
at org.elasticsearch.action.admin.indices.stats.ShardStats.<init>(ShardStats.java:49)
at org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at org.elasticsearch.node.service.NodeService.stats(NodeService.java:156)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:96)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:44)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$2.run(TransportNodesOperationAction.java:141)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Not able to recover the shard [element_v1][9] given that it has been corrupted due to the OOM. You can see that the replica on the other node has been promoted to primary.
It then appears to do a rebalancing and tried to initialize the shard again on the original node. This is when the shard gets stuck in initializing state on this node.
Could it be that we didn't clean up the corrupted files on disk when the shard data on the original node got corrupted due to the OOM, and when it tried to rebalance the shard back to this original node, it ended up using some of the corrupted data files on disk?
This may be related to https://github.com/elasticsearch/elasticsearch/issues/9018 . Pending_task output is not available for this incident and the end user has already restored from a snapshot to recover from the issue. But here are the sequence of events leading to the stuck shard in initializing state:
Original state:
The node ran oom when there were outstanding indexing and merge tasks:
Not able to recover the shard [element_v1][9] given that it has been corrupted due to the OOM. You can see that the replica on the other node has been promoted to primary.
It then appears to do a rebalancing and tried to initialize the shard again on the original node. This is when the shard gets stuck in initializing state on this node.
Could it be that we didn't clean up the corrupted files on disk when the shard data on the original node got corrupted due to the OOM, and when it tried to rebalance the shard back to this original node, it ended up using some of the corrupted data files on disk?