Open p4654545 opened 1 month ago
Issue Number: 20270 Status: Open Recommended Components: comp:gc, comp:vm, comp:test Recommended Assignees: pshipton, dmitripivkine, keithc-ca
@nbhuiyan - FYI
I've reproduced on my M2 Mac, on a recent build of the head stream, here are the results:
openjdk version "24-internal" 2025-03-18
OpenJDK Runtime Environment (build 24-internal-adhoc.nathanhenderson.openj9-openjdk-jdk)
Eclipse OpenJ9 VM (build master-cf5af0829, JRE 24 Mac OS X aarch64-64-Bit 20240911_000000 (JIT enabled, AOT enabled)
OpenJ9 - cf5af0829
OMR - 10fdf657a
JCL - 61b445b6e99 based on jdk-24+14)
No additional options (panama ~154X
slower):
Benchmark Mode Cnt Score Error Units
MemorySegmentGetUnsafe.panama avgt 30 139.712 ± 3.262 ns/op
MemorySegmentGetUnsafe.unsafe avgt 30 0.907 ± 0.008 ns/op
With -Xint
(panama ~12X
slower):
Benchmark Mode Cnt Score Error Units
MemorySegmentGetUnsafe.panama avgt 30 1614.397 ± 17.776 ns/op
MemorySegmentGetUnsafe.unsafe avgt 30 129.397 ± 2.214 ns/op
There was JIT issue that was preventing the VarHandle invocation in MemorySegmentGetUnsafe.panama
from getting inlined that was fixed in #20329.
However, we still see this large perf gap with panama
. Checking out the perf profile reveals that most of the time is spent on JNI helper methods related to ScopedMemory. The perf profile below was a recording from a high iteration count run at a point where most compilations should have already been completed.
# Overhead Command Shared Object Symbol >
# ........ ............... .................. .......................................................................>
#
12.91% com.test.Memory libj9trc29.so [.] traceV
10.73% com.test.Memory [vdso] [.] __vdso_clock_gettime
7.01% com.test.Memory libj9jit29.so [.] jitGetMapsFromPC
6.87% com.test.Memory libjclse29.so [.] JVM_GetCallerClass_Impl
5.29% com.test.Memory libjclse29.so [.] getCallerClassJEP176Iterator
4.84% com.test.Memory libj9trc29.so [.] doTracePoint
4.82% com.test.Memory libj9jit29.so [.] jitWalkStackFrames
4.53% com.test.Memory libj9vm29.so [.] walkStackFrames
2.79% com.test.Memory libj9jit29.so [.] getFirstInlinedCallSiteWithByteCodeInfo
2.72% com.test.Memory libj9jit29.so [.] getNextInlinedCallSite
2.62% com.test.Memory libj9vm29.so [.] walkFrame
2.57% com.test.Memory [JIT] tid 108601 [.] jdk/internal/foreign/AbstractMemorySegmentImpl.reinterpret(J)Ljava/>
2.47% com.test.Memory [JIT] tid 108601 [.] com/test/MemorySegmentGetUnsafe.panama()I_very-hot
2.33% com.test.Memory libj9jit29.so [.] getInlinedCallSiteArrayElement
2.29% com.test.Memory libj9jit29.so [.] jitGetExceptionTableFromPC
2.11% com.test.Memory libj9vm29.so [.] instanceOfOrCheckCast
1.95% com.test.Memory libj9vm29.so [.] internalEnterVMFromJNI
1.40% com.test.Memory libj9trc29.so [.] javaTrace
1.37% com.test.Memory libj9vm29.so [.] internalExitVMToJNI
1.25% com.test.Memory libj9jit29.so [.] hasMoreInlinedMethods
1.05% com.test.Memory libj9jit29.so [.] getCurrentByteCodeIndexAndIsSameReceiver
0.96% com.test.Memory libj9jit29.so [.] getInlinedMethod
0.80% com.test.Memory libc-2.31.so [.] clock_gettime@@GLIBC_2.17
0.73% com.test.Memory libj9vm29.so [.] j9jni_createLocalRef
0.72% com.test.Memory libc-2.31.so [.] __memcpy_avx_unaligned_erms
0.66% com.test.Memory libj9jit29.so [.] isUnloadedInlinedMethod
0.62% com.test.Memory libj9jit29.so [.] getJitInlineDepthFromCallSite
0.60% com.test.Memory libj9jit29.so [.] getJitInlinedCallInfo
The only entries in the profile above that are compiled methods are:
2.57% com.test.Memory [JIT] tid 108601 [.] jdk/internal/foreign/AbstractMemorySegmentImpl.reinterpret(J)Ljava/>
2.47% com.test.Memory [JIT] tid 108601 [.] com/test/MemorySegmentGetUnsafe.panama()I_very-hot
@babsingh FYI, since this is related to ScopedMemory. Have you observed something like this before?
since this is related to ScopedMemory. Have you observed something like this before?
This seems unrelated to the ScopedMemoryAccess
method (closeScope0
) implemented in OpenJ9.
I discussed this with @nbhuiyan on Slack. AbstractMemorySegmentImpl.reinterpret
(compiled method) invokes JVM_GetCallerClass_Impl
, which ultimately calls walkStackFrames
. Although there is an optimization for GetCallerClass
to avoid the stack walk, it doesn't activate in this case.
Java -version output
openjdk 22.0.2 2024-07-16 IBM Semeru Runtime Open Edition 22.0.2.1 (build 22.0.2+9) Eclipse OpenJ9 VM 22.0.2.1 (build openj9-0.46.1, JRE 22 Linux amd64-64-Bit Compressed References 20240716_33 (JIT enabled, AOT enabled) OpenJ9 - 4760d5d320 OMR - 840a9adba JCL - b77827589c5 based on jdk-22.0.2+9)
Summary of problem
OpenJdk MemorySegmentGetUnsafe benchmark indicates major regression. The following jmh result shows that "panama" is approximately 30 times slower than Unsafe.
Benchmark Mode Cnt Score Error Units MemorySegmentGetUnsafe.panama avgt 30 39.954 ± 5.622 ns/op MemorySegmentGetUnsafe.unsafe avgt 30 1.253 ± 0.033 ns/op
For hotspot the difference is approximately 25%.
To run the benchmark the files MemorySegmentGetUnsafe.java and Utils.java were copied from the openjdk repository.
I could not find any related open/closed issue or pull-request. Are there any plans to improve OpenJ9 MemorySegment.get()/-set() performance?