If you hit a single document ID with masses of concurrent update operations it's possible to get all but one of the write threads waiting on the document ID lock thusly:
0.0% [cpu=0.0%, other=0.0%] (0s out of 500ms) cpu usage by thread 'elasticsearch[REDACTED][write][T#4]'
10/10 snapshots sharing following 27 elements
java.base@21.0.2/jdk.internal.misc.Unsafe.park(Native Method)
java.base@21.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
java.base@21.0.2/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base@21.0.2/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
app//org.elasticsearch.common.util.concurrent.KeyedLock.acquire(KeyedLock.java:62)
app//org.elasticsearch.index.engine.LiveVersionMap.acquireLock(LiveVersionMap.java:463)
app//org.elasticsearch.index.engine.InternalEngine.get(InternalEngine.java:752)
app//org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:1274)
app//org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:194)
app//org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:101)
app//org.elasticsearch.index.get.ShardGetService.getForUpdate(ShardGetService.java:115)
app//org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:62)
This is harmful to other write traffic. Maybe we can find a way to queue this work up without blocking threads, or maybe we can find a way to reject it so that the write threads can do more meaningful work.
If you hit a single document ID with masses of concurrent update operations it's possible to get all but one of the
write
threads waiting on the document ID lock thusly:This is harmful to other write traffic. Maybe we can find a way to queue this work up without blocking threads, or maybe we can find a way to reject it so that the
write
threads can do more meaningful work.