Closed PRIESt512 closed 9 months ago
Hi @PRIESt512 !
A few years ago I tried replacing synchronized
with ReentrantLock
, but decided to keep it synchronized
because I couldn't achieve any significant performance gains. In fact, it was the opposite: performance decreased somewhat.
I'm surprised that ReentrantLock
in a wrapper improved performance in your device. I found this a little counterintuitive (I'm not doubting you).
I ran some tests today, again. If you want to see them, check this branch: https://github.com/f4b6a3/uuid-creator/pull/93
The results I obtained were not encouraging. Unfortunately.
This is the benchmark result using synchronized
on Java 17:
JDK 17 (synchronized)(4 threads)
Benchmark Mode Cnt Score Error Units
Throughput.uuidCreatorV7plus1 thrpt 5 11822,357 ± 2584,825 ops/ms
And this using ReentrantLock
:
JDK 17 (ReentrantLock)(4 threads)
Benchmark Mode Cnt Score Error Units
Throughput.uuidCreatorV7plus1 thrpt 5 11722,471 ± 1398,280 ops/ms
I also tried a ReentrantLock
in a wrapper class
:
JDK 17 (synchronized + ReentrantLock Wrapper)(4 threads)
Benchmark Mode Cnt Score Error Units
Throughput.uuidCreatorWrapper thrpt 5 9730,139 ± 722,465 ops/ms
For now, I think the synchronized keyword should stay.
Regards,
DEVICE: OpenJDK 17 Ubuntu 22.04 Dell Inc. Inspiron 3583 (4 years old) Intel® Core™ i7-8565U CPU @ 1.80GHz
Thanks for the answer!
I checked yours branch:
VM version: JDK 17.0.8.1, OpenJDK 64-Bit Server VM, 17.0.8.1+0 (M1 Max, macOS 14.0)
Benchmark Mode Cnt Score Error Units
Throughput.uuidCreatorV7plus1 thrpt 5 21945,820 ± 552,688 ops/ms
Throughput.uuidCreatorWrapper thrpt 5 19526,796 ± 951,706 ops/ms
Identical results on grallVM 21. Here graal no longer gives an improvement, I think because com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#increment
is not synchronized. Therefore, he has nothing to optimize.
P.S. Perhaps due to the fact that the arm architecture has weaker memory semantics, this gives better performance on arm processors... Ideally, you need to look at the assembler code of the JIT machine on two platforms. Interesting case...
I'll check this thread on the server later. On the server there is an AMD EPYC 7713 processor (16 cores). As soon as I can check, I will write the results.
It is also possible for CAS (Compare-And-Swap) operations to be cheaper by m1 max...
Here is some old information on Synchronized vs ReentrantLock:
Small addition: VM version: JDK 17.0.8.1, OpenJDK 64-Bit Server VM, 17.0.8.1+0 (M1 Max, macOS 14.0)
JDK 17 (synchronized)(1 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1() thrpt 5 31412,909 ± 1079,438 ops/ms
JDK 17 (synchronized)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1() thrpt 5 7334,517 ± 1797,509 ops/ms
JDK 17 (synchronized + wrapper)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1() thrpt 5 17457,796 ± 228,756 ops/ms
JDK 17 (wrapper without sync com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#create)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1() thrpt 5 18043,901 ± 894,737 ops/ms
The last 2 measurements indicate that with ReentrantLock 'synchronized' there is no effect. Because synchronized on one thread is very productive (1 measurement).
Results from the server - CentOS Linux 8, cpu - AMD EPYC 7713 (16 cores).
JDK 17 (synchronized)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt 5 17611.926 ± 2086.617 ops/ms
JDK 17 (synchronized)(1 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt 5 15476.646 ± 12847.751 ops/ms
JDK 17 (synchronized + wrapper)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt 5 16727.356 ± 4940.831 ops/ms
JDK 17 (wrapper without sync com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#create)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt 5 15376.351 ± 4534.011 ops/ms
JDK 17 (wrapper without sync com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#create)(1 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt 5 17411.914 ± 557.143 ops/ms
JDK 17 (synchronized -XX:+UseBiasedLocking)(4 threads)
Benchmark Mode Cnt Score Error Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt 5 16787.392 ± 3438.972 ops/ms
checked yours branch:
JDK 17 (4 threads)
Benchmark Mode Cnt Score Error Units
Throughput.uuidCreatorV7plus1 thrpt 5 19149.285 ± 749.855 ops/ms
Throughput.uuidCreatorWrapper thrpt 5 15240.869 ± 563.885 ops/ms
x64
As a result, judging by the measurements of sync on one thread - it has regression (synchronized, 1 threads
). In multi-threaded execution, the difference is not very big. Overall ReentrantLock
looks a bit better without causing regression on a single thread. And in general it looks more stable.
arm64 It feels like this is some other world with its own rules. It’s confusing to see such a significant difference in identical situations.
Final
Changing synchronized
to ReentrantLock
may make sense as a future investment... In this situation, it depends more on you.
I'll most likely leave the shell as it's more static in the results.
@PRIESt512
I reimplemented the TimeOrderedEpochFactory
class here.
I'll try to find time next week to look for bugs, review unit tests, and update documentation. I'm not sure if the code is ready to be merged.
New benchmarks here: https://github.com/f4b6a3/uuid-creator/actions/runs/6528172952
Thank you for your help!
A little addition for synchronized
... Recently support for virtual threads release for Java 21. And now there's another big reason not to use synchronized
. Using synchronized
may reduce the benefit of virtual threads...
Quote from Oracle:
Pinning does not make an application incorrect, but it might hinder its scalability. Try avoiding frequent and long-lived pinning by revising synchronized blocks or methods that run frequently and guarding potentially long I/O operations with java.util.concurrent.locks.ReentrantLock.
More details about virtual threads
Some popular projects also refused to use synchronized` - Postgres JDBC Driver
It wouldn't be a big surprise if synchronized
in the future was marked deprecated 🗡️.
The fact that it is increasingly not recommended for use. And the java platform encourages this approach
The UUID version 7 generator was refactored. Now it uses reentrant lock.
The version v5.3.6 will is now available.
Thanks for your help and advice!
Hi, In my project I use UuidCreator.getTimeOrderedEpochPlus1(). It's a hot call. Looking at the features of the JVM, I slightly modified the call.
The final results are as follows (JMH):
standard method call in OpenJDK 17-21 -> 7-8_000 ops/ms (4 Threads)
standard method call in Oracle GraalVM 21 -> 12-13_000 ops/ms (4 Threads). Cunning jit-fox.
call with wrapper OpenJDK 17-21 -> 17-18_000 ops/ms (4 Threads)
call with wrapper Oracle GraalVM 21 -> 20_000 ops/ms (4 Threads)
Plus, you should take into account the nuances of the JDK after version 15, see - https://dev.to/vbochenin/java-17-migration-bias-locks-regression-2c5m
Using -XX:+UseBiasedLocking doesn't make much difference in JDK 17. In 21 this option is completely removed.
It is reasonable to assume that modifying all synchronized calls should have a good effect on the library's performance.
Device - MacBookPro M1 Max