eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

OpenJdk MemorySegmentGetUnsafe benchmark score #20270

Open p4654545 opened 1 month ago

p4654545 commented 1 month ago

Java -version output

openjdk 22.0.2 2024-07-16 IBM Semeru Runtime Open Edition 22.0.2.1 (build 22.0.2+9) Eclipse OpenJ9 VM 22.0.2.1 (build openj9-0.46.1, JRE 22 Linux amd64-64-Bit Compressed References 20240716_33 (JIT enabled, AOT enabled) OpenJ9 - 4760d5d320 OMR - 840a9adba JCL - b77827589c5 based on jdk-22.0.2+9)

Summary of problem

OpenJdk MemorySegmentGetUnsafe benchmark indicates major regression. The following jmh result shows that "panama" is approximately 30 times slower than Unsafe.

Benchmark Mode Cnt Score Error Units MemorySegmentGetUnsafe.panama avgt 30 39.954 ± 5.622 ns/op MemorySegmentGetUnsafe.unsafe avgt 30 1.253 ± 0.033 ns/op

For hotspot the difference is approximately 25%.

To run the benchmark the files MemorySegmentGetUnsafe.java and Utils.java were copied from the openjdk repository.

I could not find any related open/closed issue or pull-request. Are there any plans to improve OpenJ9 MemorySegment.get()/-set() performance?

github-actions[bot] commented 1 month ago

Issue Number: 20270 Status: Open Recommended Components: comp:gc, comp:vm, comp:test Recommended Assignees: pshipton, dmitripivkine, keithc-ca

hzongaro commented 1 month ago

@nbhuiyan - FYI

ThanHenderson commented 1 month ago

I've reproduced on my M2 Mac, on a recent build of the head stream, here are the results:

openjdk version "24-internal" 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.nathanhenderson.openj9-openjdk-jdk)
Eclipse OpenJ9 VM (build master-cf5af0829, JRE 24 Mac OS X aarch64-64-Bit 20240911_000000 (JIT enabled, AOT enabled)
OpenJ9   - cf5af0829
OMR      - 10fdf657a
JCL      - 61b445b6e99 based on jdk-24+14)

No additional options (panama ~154X slower):

Benchmark                      Mode  Cnt    Score   Error  Units
MemorySegmentGetUnsafe.panama  avgt   30  139.712 ± 3.262  ns/op
MemorySegmentGetUnsafe.unsafe  avgt   30    0.907 ± 0.008  ns/op

With -Xint (panama ~12X slower):

Benchmark                      Mode  Cnt     Score    Error  Units
MemorySegmentGetUnsafe.panama  avgt   30  1614.397 ± 17.776  ns/op
MemorySegmentGetUnsafe.unsafe  avgt   30   129.397 ±  2.214  ns/op
nbhuiyan commented 3 weeks ago

There was JIT issue that was preventing the VarHandle invocation in MemorySegmentGetUnsafe.panama from getting inlined that was fixed in #20329.

However, we still see this large perf gap with panama. Checking out the perf profile reveals that most of the time is spent on JNI helper methods related to ScopedMemory. The perf profile below was a recording from a high iteration count run at a point where most compilations should have already been completed.

# Overhead  Command          Shared Object       Symbol                                                                 >
# ........  ...............  ..................  .......................................................................>
#
    12.91%  com.test.Memory  libj9trc29.so       [.] traceV
    10.73%  com.test.Memory  [vdso]              [.] __vdso_clock_gettime
     7.01%  com.test.Memory  libj9jit29.so       [.] jitGetMapsFromPC
     6.87%  com.test.Memory  libjclse29.so       [.] JVM_GetCallerClass_Impl
     5.29%  com.test.Memory  libjclse29.so       [.] getCallerClassJEP176Iterator
     4.84%  com.test.Memory  libj9trc29.so       [.] doTracePoint
     4.82%  com.test.Memory  libj9jit29.so       [.] jitWalkStackFrames
     4.53%  com.test.Memory  libj9vm29.so        [.] walkStackFrames
     2.79%  com.test.Memory  libj9jit29.so       [.] getFirstInlinedCallSiteWithByteCodeInfo
     2.72%  com.test.Memory  libj9jit29.so       [.] getNextInlinedCallSite
     2.62%  com.test.Memory  libj9vm29.so        [.] walkFrame
     2.57%  com.test.Memory  [JIT] tid 108601    [.] jdk/internal/foreign/AbstractMemorySegmentImpl.reinterpret(J)Ljava/>
     2.47%  com.test.Memory  [JIT] tid 108601    [.] com/test/MemorySegmentGetUnsafe.panama()I_very-hot
     2.33%  com.test.Memory  libj9jit29.so       [.] getInlinedCallSiteArrayElement
     2.29%  com.test.Memory  libj9jit29.so       [.] jitGetExceptionTableFromPC
     2.11%  com.test.Memory  libj9vm29.so        [.] instanceOfOrCheckCast
     1.95%  com.test.Memory  libj9vm29.so        [.] internalEnterVMFromJNI
     1.40%  com.test.Memory  libj9trc29.so       [.] javaTrace
     1.37%  com.test.Memory  libj9vm29.so        [.] internalExitVMToJNI
     1.25%  com.test.Memory  libj9jit29.so       [.] hasMoreInlinedMethods
     1.05%  com.test.Memory  libj9jit29.so       [.] getCurrentByteCodeIndexAndIsSameReceiver
     0.96%  com.test.Memory  libj9jit29.so       [.] getInlinedMethod
     0.80%  com.test.Memory  libc-2.31.so        [.] clock_gettime@@GLIBC_2.17
     0.73%  com.test.Memory  libj9vm29.so        [.] j9jni_createLocalRef
     0.72%  com.test.Memory  libc-2.31.so        [.] __memcpy_avx_unaligned_erms
     0.66%  com.test.Memory  libj9jit29.so       [.] isUnloadedInlinedMethod
     0.62%  com.test.Memory  libj9jit29.so       [.] getJitInlineDepthFromCallSite
     0.60%  com.test.Memory  libj9jit29.so       [.] getJitInlinedCallInfo

The only entries in the profile above that are compiled methods are:

     2.57%  com.test.Memory  [JIT] tid 108601    [.] jdk/internal/foreign/AbstractMemorySegmentImpl.reinterpret(J)Ljava/>
     2.47%  com.test.Memory  [JIT] tid 108601    [.] com/test/MemorySegmentGetUnsafe.panama()I_very-hot

@babsingh FYI, since this is related to ScopedMemory. Have you observed something like this before?

babsingh commented 3 weeks ago

since this is related to ScopedMemory. Have you observed something like this before?

This seems unrelated to the ScopedMemoryAccess method (closeScope0) implemented in OpenJ9.

I discussed this with @nbhuiyan on Slack. AbstractMemorySegmentImpl.reinterpret (compiled method) invokes JVM_GetCallerClass_Impl, which ultimately calls walkStackFrames. Although there is an optimization for GetCallerClass to avoid the stack walk, it doesn't activate in this case.

image