OpenHFT / Zero-Allocation-Hashing

Zero-allocation hashing for Java
Apache License 2.0
787 stars 136 forks source link

Replace Unsafe with VarHandle #67

Open qweek opened 2 years ago

qweek commented 2 years ago

I tested my implementation for Java 9+ with VarHandles instead of Unsafe. In my test case (hashing multiple files with a total size of ~ 1 GB) I noticed a huge speed improvement

Is it possible to publish separate version for modern JDKs without dependency on Unsafe?

gzm55 commented 2 years ago

@qweek Impressively, may be all hash would be improved with VarHandle which is designed to replace operations in Unsafe.

I suggest, in a new pr, create a dedicated Access implementation with VarHandle, and detect a better access class at runtime, in case of portability problems on the mobile jdk. After passing all the ut, it is better to attach jmh profiling benchmark results on major hashs and LTS JDK.

plokhotnyuk commented 2 years ago

@qweek It can be published as a multi-release JAR in the same artifact that is still compatible with Java 8: https://in.relation.to/2017/02/13/building-multi-release-jars-with-maven/

qweek commented 2 years ago

@gzm55 I'll try to look at it, but it won't be any time soon. It may be easier for you to start by checking how much it speeds up your use cases.

@plokhotnyuk I thought about multi-release JAR, but it have some drawbacks:

With such a structure that is required to maintain backward compatibility, the code will turn into mess

develar commented 2 years ago

@qweek I see that in your fork there is no support for seed. It was difficult to implement?

gzm55 commented 2 years ago

I try to benchmark the Unsafe.getLong and VarHandle.get on a x64 server with LTS JDK 11 and 17. The Unsafe.getLong performs about 17% better than VarHandle on JDK 11, and 50% better on JDK 17. On another macOS with JDK11 I get nearly the same results.

@qweek, in your implementation and benchmark, is there other difference between two compared version?

# JMH version: 1.34
# VM version: JDK 17.0.2, Java HotSpot(TM) 64-Bit Server VM, 17.0.2+8-LTS-86
# VM invoker: /opt/jdk-17.0.2/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 3 iterations, 3 s each
# Measurement: 3 iterations, 3 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations

JDK 11 Result

Benchmark                   Mode  Cnt       Score       Error  Units
VarHandleJmh.getLongPlain   avgt    3  480385.742 ± 45707.927  ns/op
VarHandleJmh.getLongUnsafe  avgt    3  405600.434 ±   609.814  ns/op

JDK 17 Result

Benchmark                   Mode  Cnt       Score     Error  Units
VarHandleJmh.getLongPlain   avgt    3  165615.717 ± 387.519  ns/op
VarHandleJmh.getLongUnsafe  avgt    3   80043.996 ± 102.376  ns/op

Code

import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
import java.util.Random;
import java.util.Arrays;
import java.util.concurrent.TimeUnit;
import sun.misc.Unsafe;
import java.lang.reflect.Field;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;

import static java.nio.ByteOrder.LITTLE_ENDIAN;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 3)
@Fork(1)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class VarHandleJmh {
        private final static int L = 1024 * 1024;
        byte data[] = new byte[L];

        private static final Unsafe UNSAFE;

        private static final long BYTE_BASE;

        private static final VarHandle LONG_HANDLE = MethodHandles.byteArrayViewVarHandle(long[].class, LITTLE_ENDIAN);

        static {
                Unsafe u = null;
                try {
                        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
                        theUnsafe.setAccessible(true);
                        u = (Unsafe) theUnsafe.get(null);
                } catch (final Exception e) {
                }

                UNSAFE = u;

                BYTE_BASE = UNSAFE.arrayBaseOffset(byte[].class);
        }

        @Setup
        public void init() {
                new Random().nextBytes(data);
        }

       @Benchmark
        public void getLongUnsafe(Blackhole bh) {
                for (long i = BYTE_BASE; i < BYTE_BASE + L - 8; i += 8) {
                        final long d = UNSAFE.getLong(data, i);
                        bh.consume(d);
                }
        }

        @Benchmark
        public void getLongPlain(Blackhole bh) {
                for (long i = 0; i < L - 8; i += 8) {
                        final long d = (long)LONG_HANDLE.get(data, (int)i);
                        bh.consume(d);
                }
        }

        public static void main(String[] args) throws Exception {
                org.openjdk.jmh.Main.main(args);
        }
}
gzm55 commented 2 years ago

I also benchmarked a dedicated VarHandleAccess vs UnsafeAccess, the latter one still performs about 5% better.

skazis commented 1 year ago

Just wanted to let you know that on Java 17 I stumbled upon a strange problem with hashing, MD5 in particular. It is hard to provide a repeatable scenario as this happened on PROD system without any clear circumstances. And one of the attempted solutions was to switch to this library and XXH3 algorithm since we care about speed and not crypto. The problem I mention is that many threads got stuck around VarHandle and CPU was overwhelmed, example stack trace (there is not a single hotspot line, but for sure they all are within method implCompress0):

"http-nio-8080-exec-19" #934 daemon prio=5 os_prio=0 cpu=395884.88ms elapsed=55340.79s tid=0x00007f6a44011d00 nid=0x20b850 runnable [0x00007f6a58a0a000] java.lang.Thread.State: RUNNABLE at jdk.internal.util.Preconditions.checkIndex(java.base@17.0.7/Preconditions.java:267) at java.lang.invoke.VarHandleByteArrayAsInts$ArrayHandle.index(java.base@17.0.7/VarHandleByteArrayAsInts.java:103) at java.lang.invoke.VarHandleByteArrayAsInts$ArrayHandle.get(java.base@17.0.7/VarHandleByteArrayAsInts.java:120) at java.lang.invoke.VarHandleGuards.guard_LI_I(java.base@17.0.7/VarHandleGuards.java:163) at sun.security.provider.MD5.implCompress0(java.base@17.0.7/MD5.java:182) at sun.security.provider.MD5.implCompress(java.base@17.0.7/MD5.java:146) at sun.security.provider.DigestBase.implCompressMultiBlock0(java.base@17.0.7/DigestBase.java:150) at sun.security.provider.DigestBase.implCompressMultiBlock(java.base@17.0.7/DigestBase.java:144) at sun.security.provider.DigestBase.engineUpdate(java.base@17.0.7/DigestBase.java:131) at java.security.MessageDigest$Delegate.engineUpdate(java.base@17.0.7/MessageDigest.java:658) at java.security.MessageDigest.update(java.base@17.0.7/MessageDigest.java:349)