Closed sakuuj closed 2 months ago
Hello! As you already figured out, this is a problem with the underlying RestClient used by the java client. Unfortunately, updating it probably won't be enough to fix this issue: many users have been reporting problems when using virtual threads and locks/synchronized blocks, this thread being one of many examples. This is likely going to be fixed in Java 23, as explained in this JEP draft, until then we have no way of ensuring compatibility with virtual threads.
Hello! Thank you for your informative response.
Java API client version
8.13.4
Java version
21
Elasticsearch Version
8.14.3
Problem description
I have been testing performance of ElasticsearchOperations (Spring Data Elasticsearch, uses org.elasticsearch.client.RestClient underneath), by calling my controller endpoints and noticed a deadlock when using virtual threads and a custom IOReactorConfig with IoThreadCount set to a low number (1 in the current example). I used hatoo/oha and set a number of concurrent requests and a number of total requests to a 100. And there was a deadlock. As I found out after checking the thread dump from jcmd, pinned virtual threads were in a synchronized block:
and some of the unmounted virtual threads, were waiting for the lock in the AbstractNIOConnPool:
And the elasticsearch thread that should notify threads waiting in a synchronized block, was also blocked by the ReentrantLock in the AbstractNIOConnPool:
Maybe it happened because:
Or maybe that is wrong and there were some other interleavings. But the fact is that a deadlock was detected and it would be nice to change current RestClient implementation.
The tested program configurations:
Hardware: a processor with 8 physical cores.
How to fix the problem, but lose throughput? The problem could be fixed with wrapping all the methods of RestClient with ReentrantLock.lock, ReentrantLock.unlock statements (for example, using Spring AOP). At least there were no deadlocks detected in that case, when manually testing with hatoo/oha with 10_000 of concurrent requests. But such a solution degrades the overall throughput.