elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.42k stars 24.57k forks source link

Push back on writes to heavily-contended doc IDs #109829

Open DaveCTurner opened 2 months ago

DaveCTurner commented 2 months ago

If you hit a single document ID with masses of concurrent update operations it's possible to get all but one of the write threads waiting on the document ID lock thusly:

0.0% [cpu=0.0%, other=0.0%] (0s out of 500ms) cpu usage by thread 'elasticsearch[REDACTED][write][T#4]'
     10/10 snapshots sharing following 27 elements
       java.base@21.0.2/jdk.internal.misc.Unsafe.park(Native Method)
       java.base@21.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
       java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
       java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
       java.base@21.0.2/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
       java.base@21.0.2/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
       app//org.elasticsearch.common.util.concurrent.KeyedLock.acquire(KeyedLock.java:62)
       app//org.elasticsearch.index.engine.LiveVersionMap.acquireLock(LiveVersionMap.java:463)
       app//org.elasticsearch.index.engine.InternalEngine.get(InternalEngine.java:752)
       app//org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:1274)
       app//org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:194)
       app//org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:101)
       app//org.elasticsearch.index.get.ShardGetService.getForUpdate(ShardGetService.java:115)
       app//org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:62)

This is harmful to other write traffic. Maybe we can find a way to queue this work up without blocking threads, or maybe we can find a way to reject it so that the write threads can do more meaningful work.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-distributed (Team:Distributed)