f4b6a3 / uuid-creator

UUID Creator is a Java library for generating Universally Unique Identifiers.
MIT License
410 stars 44 forks source link

Replacing all synchronized methods with ReentrantLock #92

Closed PRIESt512 closed 9 months ago

PRIESt512 commented 11 months ago

Hi, In my project I use UuidCreator.getTimeOrderedEpochPlus1(). It's a hot call. Looking at the features of the JVM, I slightly modified the call.

enum class UuidCreatorWrapper(private val lock: ReentrantLock = ReentrantLock()) {
    INSTANCE;

    fun getUUID(): UUID {
        lock.lock();
        try {
            return UuidCreator.getTimeOrderedEpochPlus1();
        } finally {
            lock.unlock();
        }
    }
}

The final results are as follows (JMH):

Plus, you should take into account the nuances of the JDK after version 15, see - https://dev.to/vbochenin/java-17-migration-bias-locks-regression-2c5m

Using -XX:+UseBiasedLocking doesn't make much difference in JDK 17. In 21 this option is completely removed.

It is reasonable to assume that modifying all synchronized calls should have a good effect on the library's performance.

Device - MacBookPro M1 Max

fabiolimace commented 11 months ago

Hi @PRIESt512 !

A few years ago I tried replacing synchronized with ReentrantLock, but decided to keep it synchronized because I couldn't achieve any significant performance gains. In fact, it was the opposite: performance decreased somewhat.

I'm surprised that ReentrantLock in a wrapper improved performance in your device. I found this a little counterintuitive (I'm not doubting you).

I ran some tests today, again. If you want to see them, check this branch: https://github.com/f4b6a3/uuid-creator/pull/93

The results I obtained were not encouraging. Unfortunately.

This is the benchmark result using synchronized on Java 17:

JDK 17 (synchronized)(4 threads)

Benchmark                       Mode  Cnt      Score      Error   Units
Throughput.uuidCreatorV7plus1  thrpt    5  11822,357 ± 2584,825  ops/ms

And this using ReentrantLock:

JDK 17 (ReentrantLock)(4 threads)

Benchmark                       Mode  Cnt      Score      Error   Units
Throughput.uuidCreatorV7plus1  thrpt    5  11722,471 ± 1398,280  ops/ms

I also tried a ReentrantLock in a wrapper class:

JDK 17 (synchronized + ReentrantLock Wrapper)(4 threads)

Benchmark                       Mode  Cnt     Score     Error   Units
Throughput.uuidCreatorWrapper  thrpt    5  9730,139 ± 722,465  ops/ms

For now, I think the synchronized keyword should stay.

Regards,


DEVICE: OpenJDK 17 Ubuntu 22.04 Dell Inc. Inspiron 3583 (4 years old) Intel® Core™ i7-8565U CPU @ 1.80GHz

PRIESt512 commented 11 months ago

Thanks for the answer!

I checked yours branch:

VM version: JDK 17.0.8.1, OpenJDK 64-Bit Server VM, 17.0.8.1+0 (M1 Max, macOS 14.0)

Benchmark                       Mode  Cnt      Score     Error   Units
Throughput.uuidCreatorV7plus1  thrpt    5  21945,820 ± 552,688  ops/ms
Throughput.uuidCreatorWrapper  thrpt    5  19526,796 ± 951,706  ops/ms

Identical results on grallVM 21. Here graal no longer gives an improvement, I think because com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#increment is not synchronized. Therefore, he has nothing to optimize.

P.S. Perhaps due to the fact that the arm architecture has weaker memory semantics, this gives better performance on arm processors... Ideally, you need to look at the assembler code of the JIT machine on two platforms. Interesting case...

I'll check this thread on the server later. On the server there is an AMD EPYC 7713 processor (16 cores). As soon as I can check, I will write the results.

PRIESt512 commented 11 months ago

It is also possible for CAS (Compare-And-Swap) operations to be cheaper by m1 max...

Here is some old information on Synchronized vs ReentrantLock:

Synchronized vs ReentrantLock Mechanical Sympathy

PRIESt512 commented 11 months ago

Small addition: VM version: JDK 17.0.8.1, OpenJDK 64-Bit Server VM, 17.0.8.1+0 (M1 Max, macOS 14.0)

JDK 17 (synchronized)(1 threads)
Benchmark                               Mode  Cnt      Score      Error   Units
UuidCreator.getTimeOrderedEpochPlus1()  thrpt    5  31412,909 ± 1079,438  ops/ms
JDK 17 (synchronized)(4 threads)
Benchmark                                Mode  Cnt     Score      Error   Units
UuidCreator.getTimeOrderedEpochPlus1()   thrpt    5  7334,517 ± 1797,509  ops/ms
JDK 17 (synchronized + wrapper)(4 threads)
Benchmark                                Mode  Cnt     Score      Error   Units
UuidCreator.getTimeOrderedEpochPlus1()  thrpt    5  17457,796 ± 228,756  ops/ms
JDK 17 (wrapper without sync com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#create)(4 threads)
Benchmark                               Mode  Cnt     Score      Error   Units
UuidCreator.getTimeOrderedEpochPlus1()   thrpt    5  18043,901 ± 894,737  ops/ms

The last 2 measurements indicate that with ReentrantLock 'synchronized' there is no effect. Because synchronized on one thread is very productive (1 measurement).

PRIESt512 commented 11 months ago

Results from the server - CentOS Linux 8, cpu - AMD EPYC 7713 (16 cores).

JDK 17 (synchronized)(4 threads)
Benchmark                            Mode  Cnt      Score       Error   Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt    5  17611.926 ± 2086.617  ops/ms
JDK 17 (synchronized)(1 threads)
Benchmark                              Mode  Cnt      Score       Error   Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt    5  15476.646  ± 12847.751  ops/ms
JDK 17 (synchronized + wrapper)(4 threads)
Benchmark                            Mode  Cnt      Score       Error   Units
UuidCreator.getTimeOrderedEpochPlus1 thrpt    5  16727.356 ± 4940.831  ops/ms
JDK 17 (wrapper without sync com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#create)(4 threads)
Benchmark                             Mode  Cnt      Score       Error   Units
UuidCreator.getTimeOrderedEpochPlus1  thrpt    5  15376.351 ± 4534.011  ops/ms
JDK 17 (wrapper without sync com.github.f4b6a3.uuid.factory.rfc4122.TimeOrderedEpochFactory#create)(1 threads)
Benchmark                             Mode  Cnt      Score       Error   Units
UuidCreator.getTimeOrderedEpochPlus1  thrpt    5  17411.914 ± 557.143  ops/ms
JDK 17 (synchronized -XX:+UseBiasedLocking)(4 threads)
Benchmark                             Mode  Cnt      Score       Error   Units
UuidCreator.getTimeOrderedEpochPlus1  thrpt    5  16787.392 ± 3438.972  ops/ms

checked yours branch:

JDK 17 (4 threads)
Benchmark                       Mode  Cnt      Score     Error   Units
Throughput.uuidCreatorV7plus1  thrpt    5  19149.285 ± 749.855  ops/ms
Throughput.uuidCreatorWrapper  thrpt    5  15240.869 ± 563.885  ops/ms

x64 As a result, judging by the measurements of sync on one thread - it has regression (synchronized, 1 threads). In multi-threaded execution, the difference is not very big. Overall ReentrantLock looks a bit better without causing regression on a single thread. And in general it looks more stable.

arm64 It feels like this is some other world with its own rules. It’s confusing to see such a significant difference in identical situations.

Final Changing synchronized to ReentrantLock may make sense as a future investment... In this situation, it depends more on you.

I'll most likely leave the shell as it's more static in the results.

fabiolimace commented 11 months ago

@PRIESt512

I reimplemented the TimeOrderedEpochFactory class here.

I'll try to find time next week to look for bugs, review unit tests, and update documentation. I'm not sure if the code is ready to be merged.

New benchmarks here: https://github.com/f4b6a3/uuid-creator/actions/runs/6528172952

Thank you for your help!

PRIESt512 commented 9 months ago

A little addition for synchronized... Recently support for virtual threads release for Java 21. And now there's another big reason not to use synchronized. Using synchronized may reduce the benefit of virtual threads...

Quote from Oracle:

Pinning does not make an application incorrect, but it might hinder its scalability. Try avoiding frequent and long-lived pinning by revising synchronized blocks or methods that run frequently and guarding potentially long I/O operations with java.util.concurrent.locks.ReentrantLock.

More details about virtual threads

Some popular projects also refused to use synchronized` - Postgres JDBC Driver

It wouldn't be a big surprise if synchronized in the future was marked deprecated 🗡️. The fact that it is increasingly not recommended for use. And the java platform encourages this approach

fabiolimace commented 9 months ago

The UUID version 7 generator was refactored. Now it uses reentrant lock.

The version v5.3.6 will is now available.

Thanks for your help and advice!