eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

jdk19 SC_Softmx_UpDown_0 crash SC_Softmx_UpDown_0 #16741

Closed pshipton closed 1 year ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_extended.system_aarch64_linux_Nightly_testList_2/111 SC_Softmx_UpDown_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_extended.system_aarch64_linux_Nightly_testList_2/111/system_test_output.tar.gz

STF 02:52:24.735 - +------ Step 15 - Wait for processes to complete
STF 02:52:24.735 - | Wait for processes to meet expectations
STF 02:52:24.735 - |   Processes: [jvm3]
STF 02:52:24.735 - |
STF 02:52:24.735 - Monitoring processes: jvm3
STF 02:52:39.819 - Found dump at: /home/jenkins/workspace/Test_openjdk19_j9_extended.system_aarch64_linux_Nightly_testList_2/aqa-tests/TKG/output_16766020121362/SC_Softmx_UpDown_0/20230217-025147-SharedClassesWorkloadTest_Softmx_IncreaseDecrease/results/javacore.20230217.025237.318454.0002.txt
STF 02:52:39.821 - Found dump at: /home/jenkins/workspace/Test_openjdk19_j9_extended.system_aarch64_linux_Nightly_testList_2/aqa-tests/TKG/output_16766020121362/SC_Softmx_UpDown_0/20230217-025147-SharedClassesWorkloadTest_Softmx_IncreaseDecrease/results/core.20230217.025237.318454.0001.dmp
STF 02:52:59.291 - Found dump at: /home/jenkins/workspace/Test_openjdk19_j9_extended.system_aarch64_linux_Nightly_testList_2/aqa-tests/TKG/output_16766020121362/SC_Softmx_UpDown_0/20230217-025147-SharedClassesWorkloadTest_Softmx_IncreaseDecrease/results/Snap.20230217.025237.318454.0003.trc
STF 02:52:59.642 - **FAILED** Process jvm3 ended with exit code (255) and not the expected exit code/s (0,1)
STF 02:52:59.643 - Monitoring Report Summary:
STF 02:52:59.643 -   o Process jvm3 has crashed unexpectedly
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000FFFF894A1744 Handler2=0000FFFF89408AC0 InaccessibleAddress=F2DFFFC2F2BD38AA
R0=0000FFFF84027110 R1=F2DFFFC2F2BD38A2 R2=0000000000000010 R3=0000FFFF89607048
R4=0000FFFF841B22F0 R5=0000000000000020 R6=0000000000000026 R7=0000FFFEC985CA36
R8=0000FFFEEBEFFF60 R9=FFFFFFFFFFFFFFA8 R10=0000FFFEE9AB5110 R11=0000000000000000
R12=0000FFFF7E448BB0 R13=0000FFFF0791E598 R14=00000000000001D0 R15=0000FFFF83D91A90
R16=0000FFFF8964F2B0 R17=0000FFFF89AC3150 R18=0000FFFF075FB368 R19=0000FFFF84980D00
R20=0000FFFE74007F50 R21=0000FFFF8960706C R22=F2DFFFC2F2BD3882 R23=0000FFFF8964E000
R24=000000000040F900 R25=0000FFFF89652420 R26=0000000040000000 R27=0000FFFF84027110
R28=0000FFFF89510F90 R29=0000FFFEE9AB4DC0 R30=0000FFFF8957A71C R31=0000FFFEE9AB4DB0
PC=0000FFFF89518CB4 SP=0000FFFEE9AB4DB0 PSTATE=0000000060001000
V0 0000000000000005 (f: 5.000000, d: 2.470328e-323)
V1 0000ffff7e4491c0 (f: 2118423040.000000, d: 1.390660e-309)
V2 3ff5555555555555 (f: 1431655808.000000, d: 1.333333e+00)
V3 aaa8000000200000 (f: 2097152.000000, d: -3.348596e-103)
V4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V7 8020080280200802 (f: 2149582848.000000, d: -4.458850e-308)
V8 bb50000000000000 (f: 0.000000, d: -5.293956e-23)
V9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V16 4010040140100401 (f: 1074791424.000000, d: 4.003911e+00)
V17 fcfd000000ff0000 (f: 16711680.000000, d: -1.157587e+294)
V18 8020080280000000 (f: 2147483648.000000, d: -4.458850e-308)
V19 3f14ce199eaadbb7 (f: 2661997568.000000, d: 7.936508e-05)
V20 3ed0cfca86e65239 (f: 2263241216.000000, d: 4.008218e-06)
V21 3fadde2d52defd9a (f: 1390345600.000000, d: 5.833570e-02)
V22 3fa97b4b24760deb (f: 611716608.000000, d: 4.976878e-02)
V23 3f4344d8f2f26501 (f: 4075971840.000000, d: 5.880412e-04)
V24 3f3026f71a8d1068 (f: 445452384.000000, d: 2.464631e-04)
V25 c7e143c5e0000000 (f: 3758096384.000000, d: -1.835902e+38)
V26 c7e143c5e0000000 (f: 3758096384.000000, d: -1.835902e+38)
V27 c7e143c5e0000000 (f: 3758096384.000000, d: -1.835902e+38)
V28 c7e143c5e0000000 (f: 3758096384.000000, d: -1.835902e+38)
V29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
V30 0000000040400000 (f: 1077936128.000000, d: 5.325712e-315)
V31 4000000000000000 (f: 0.000000, d: 2.000000e+00)
Module=/home/jenkins/workspace/Test_openjdk19_j9_extended.system_aarch64_linux_Nightly_testList_2/openjdkbinary/j2sdk-image/lib/default/libj9vm29.so
Module_base_address=0000FFFF8946F000
Target=2_90_20230217_182 (Linux 5.4.0-135-generic)
CPU=aarch64 (8 logical CPUs) (0x1eff6d000 RAM)
----------- Stack Backtrace -----------
bytecodeLoopFull+0x7d24 (0x0000FFFF89518CB4 [libj9vm29.so+0xa9cb4])
c_cInterpreter+0x54 (0x0000FFFF8957A71C [libj9vm29.so+0x10b71c])
---------------------------------------
1XMCURTHDINFO  Current thread
3XMTHREADINFO      "load-6" J9VMThread:0x0000FFFF84980D00, omrthread_t:0x0000FFFF8497E2E0, java/lang/Thread:0x0000FFFF07A758A8, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x22, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x0000FFFF07664BA8)
3XMTHREADINFO1            (native thread ID:0x4DC17, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000020)
3XMTHREADINFO2            (native stack address range from:0x0000FFFEE9A77000, to:0x0000FFFEE9AB7000, size:0x40000)
3XMCPUTIME               CPU usage total: 2.484176600 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/invoke/LambdaForm$NFI/0x000000007800f130.invoke_LI_I(LambdaForm$NFI)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$DMH/0x000000008442a680.invokeStatic(LambdaForm$DMH(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/LambdaForm$NamedFunction.invokeWithArguments(LambdaForm.java:1174)
4XESTACKTRACE                at java/lang/invoke/LambdaForm.interpretName(LambdaForm.java:959)
4XESTACKTRACE                at java/lang/invoke/LambdaForm.interpretWithArguments(LambdaForm.java:936)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$LFI/0x000000007800d920.interpret_V(LambdaForm$LFI)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$MH/0x0000000088024cc0.invokeExact_MT(LambdaForm$MH)
4XESTACKTRACE                at net/adoptopenjdk/test/lang/invoke/StaticAsTypeTestExplicit.testVoid(StaticAsTypeTestExplicit.java:358)
4XESTACKTRACE                at net/adoptopenjdk/test/lang/invoke/StaticAsTypeTestExplicit.testAll(StaticAsTypeTestExplicit.java:52)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$DMH/0x0000000088021640.invokeStatic(LambdaForm$DMH)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$MH/0x000000008802bd60.invoke(LambdaForm$MH)
4XESTACKTRACE                at java/lang/invoke/LambdaForm$MH/0x00000000844364b0.invokeExact_MT(LambdaForm$MH(Compiled Code))
4XESTACKTRACE                at jdk/internal/reflect/DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:154(Compiled Code))
4XESTACKTRACE                at jdk/internal/reflect/DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104(Compiled Code))
4XESTACKTRACE                at java/lang/reflect/Method.invoke(Method.java:578(Compiled Code))
4XESTACKTRACE                at net/adoptopenjdk/loadTest/adaptors/ArbitraryJavaAdaptor.executeTest(ArbitraryJavaAdaptor.java:102)
4XESTACKTRACE                at net/adoptopenjdk/loadTest/LoadTestRunner$2.run(LoadTestRunner.java:182)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
4XESTACKTRACE                at java/lang/Thread.run(Thread.java:1573)
pshipton commented 1 year ago

@tajila fyi

tajila commented 1 year ago

The crash occurs in a J2I transition frame

tajila commented 1 year ago

@babsingh can you take a quick look, I suspect we will need the JIT team to look at this as well

babsingh commented 1 year ago

The invokeBasic INL is invoked from the JIT. _currentThread->tempSlot seems invalid. This leads to incorrect values for lambdaForm, memberName and ends with a crash while evaluating _sendMethod.

@nbhuiyan Can you please take a look?

fyi @0xdaryl @jdmpapin

Crash location

https://github.com/eclipse-openj9/openj9/blob/27a475d572bb8803858acb74f3eef028ea6ef061/runtime/vm/BytecodeInterpreter.hpp#L8956

jdmpview output

> stackslots summary
[Top frame, J2I Frame] LambdaForm$NFI/0x000000007800f130.invoke_LI_I, !j9method 0x0000FFFE7800F778, Bytecode index = 13:
   -- invokevirtual 6 java/lang/invoke/MethodHandle.invokeBasic(Ljava/lang/Object;I)I
[JIT Frame] LambdaForm$DMH/0x000000008442a680.invokeStatic, !j9method 0x0000FFFF8442AEC8, Bytecode index = 11:
   -- invokestatic 9 java/lang/invoke/MethodHandle.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object;
[Bytecode Frame] LambdaForm$NamedFunction.invokeWithArguments, !j9method 0x0000FFFF8438A110, Bytecode index = 21:
   -- invokevirtual 59 java/lang/invoke/MethodHandle.invokeBasic(Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;
[Bytecode Frame] LambdaForm.interpretName, !j9method 0x0000FFFF8420EF98, Bytecode index = 119:
   -- invokevirtual 208 java/lang/invoke/LambdaForm$NamedFunction.invokeWithArguments([Ljava/lang/Object;)Ljava/lang/Object;

> !j9vmthread 0xffff84980d00 | grep -i jitStackFrameFlags // invokeBasic from the JIT
    0x38: UDATA jitStackFrameFlags = 0x0000000040000000 (1073741824) // J9_SSF_JIT_NATIVE_TRANSITION_FRAME is set

> !j9vmthread 0xffff84980d00 | grep -i tempslot // (CORRUPT)
    0xf0: UDATA tempSlot = 0xFFFFFFFFFFFFFFA8 (-88)

> !j9object 0xf2dfffc2f2bd3882 // lambdaForm from gdb (CORRUPT)
Unable to read object clazz at 0xF2DFFFC2F2BD3882 (clazz = (null))

gdb output

(gdb) where
...
#11 0x0000ffff89408c34 in mainSynchSignalHandler (signal=11, sigInfo=0xfffee9ab3b50, contextInfo=0xfffee9ab3bd0)
    at /home/jenkins/workspace/Build_JDK19_aarch64_linux_Nightly/omr/port/unix/omrsignal.c:1066
#12 <signal handler called>
#13 0x0000ffff89518cb4 in VM_BytecodeInterpreterFull::invokeBasic (_pc=<synthetic pointer>: <optimized out>, _sp=<synthetic pointer>: <optimized out>, this=0xfffee9ab5110)
    at /home/jenkins/workspace/Build_JDK19_aarch64_linux_Nightly/openj9/runtime/vm/BytecodeInterpreter.hpp:8956
#14 VM_BytecodeInterpreterFull::run (vmThread=<optimized out>, this=0xfffee9ab5110) at /home/jenkins/workspace/Build_JDK19_aarch64_linux_Nightly/openj9/runtime/vm/BytecodeInterpreter.hpp:10727
#15 bytecodeLoopFull (currentThread=<optimized out>) at /home/jenkins/workspace/Build_JDK19_aarch64_linux_Nightly/openj9/runtime/vm/BytecodeInterpreter.inc:112
#16 0x0000ffff8957a71c in c_cInterpreter () at /home/jenkins/workspace/Build_JDK19_aarch64_linux_Nightly/build/linux-aarch64-server-release/vm/runtime/vm/arm64cinterp.s:110
...

(gdb) p _sendMethod
$1 = (J9Method *) 0xffff841b22f0

(gdb) p memberName
$2 = <optimized out>

(gdb) p lambdaForm
$3 = (j9object_t) 0xf2dfffc2f2bd3882

(gdb) p mhReceiver
$4 = <optimized out>

(gdb) p _literals
$5 = (J9Method *) 0xfffeec5c655c

(gdb) p _pc
$6 = (U_8 *&) <synthetic pointer>: <optimized out>
nbhuiyan commented 1 year ago

If the last compiled method in the stack is LambdaForm$DMH/0x000000008442a680.invokeStatic, then what the JIT-ed code is setting things up for is a linkToStatic call. Here are the bytecodes for invokeStatic:

> !bytecodes 0x0000FFFF8442AEC8
...
    0 aload0
    1 invokestatic 6 java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;
    4 astore3
    5 aload1
    6 aload2
    7 aload3
    8 checkcast 7 java/lang/invoke/MemberName
   11 invokestatic 9 java/lang/invoke/MethodHandle.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object;
   14 return1

The compiled code (with some manual annotations) for this method sets both the tempSlot and the floatTemp1 fields right before the call:

   0xfffeec546fcc:  mov x0, #0x3                    // #3
   0xfffeec546fd0:  str x0, [x19, #240]             // store to VMThread.tempSlot
   0xfffeec546fd4:  add x1, x19, #0x100             // x1 set to floatTemp1
   0xfffeec546fd8:  mov x0, #0xffffffffffffffff     // #-1
   0xfffeec546fdc:  str x0, [x1]                    // store to floatTemp1
   0xfffeec546fe0:  ldr x0, [x20, #56]
   0xfffeec546fe4:  ldr x1, [x20, #48]
   0xfffeec546fe8:  ldr x2, [x20, #40]
   0xfffeec546fec:  bl  0xfffeec547080              // Call Snippet (java/lang/invoke/MethodHandle.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object;)

The JIT-ed code obtained from the core dump shown above correctly sets tempSlot to 3, as well as floatTemp1 to -1 since this linkToStatic call is not for handling an unresolved invokedynamic/invokehandle. Shouldn't that result in getting to the linkToStaticSpecial INL helper? This could have then resulted in entering the interpreted method LambdaForm$NFI/0x000000007800f130.invoke_LI_I, which contains the actual invokeBasic call that results in getting to the crash site in the invokeBasic INL helper, which is assuming that we reached there from the JIT. Somewhere between the JIT-ed body of invokeStatic and the invokeBasic INL helper, both the tempSlot and floatTemp1 fields are overwritten, which is not surprising given how overused these fields are for various reasons. The tempSlot field value cannot be relied on after the first use in an INL helper, so not sure how we ended up in this situation.

babsingh commented 1 year ago

@gacholio Can you provide your insights on the below solution and let us know if there is a simpler solution?

Problem

In the invokeBasic INL, we rely upon the J9_SSF_JIT_NATIVE_TRANSITION_FRAME flag in jitStackFrameFlags to determine if the INL is being directly called from the JIT. During this failure, there is an i2jTransition -> java/lang/Integer.intValue()I [JITed method] -> j2iTransition. This is followed by a call to the invokeBasic INL. Before the call to the invokeBasic INL, the j2iTransition sets the J9_SSF_JIT_NATIVE_TRANSITION_FRAME flag in jitStackFrameFlags since J9AccNative is set for MethodHandle.invokeBasic (modifiers). Even though invokeBasic is interpreted, the INL falsely believes that it is being directly called by the JIT since the J9_SSF_JIT_NATIVE_TRANSITION_FRAME flag is set during j2iTransition.

Solution

Add a new flag (J9_SSF_INL_CALL_FROM_JIT = 0x80000000). JIT will exclusively set this flag in jitStackFrameFlags whenever it directly calls the linkTo* and invokeBasic INLs: J9CodeGenerator.cpp#L772-L840. Prevent this flag to be overridden in j2iTransition. Update the linkTo* and invokeBasic INLs to take the fromJIT path only if the new flag is set.

jdmpview output

> !bytecodes 0x0000FFFE7800F778
  Name: invoke_LI_I
  Signature: (Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;
  Access Flags (20080008): default static
  Internal Attribute Flags: @FrameIteratorSkip
  Max Stack: 4
  Argument Count: 2
  Temp Count: 0

    0 aload0
    1 aload1
    2 iconst0
    3 aaload
    4 aload1
    5 iconst1
    6 aaload
    7 checkcast 3 java/lang/Integer
   10 invokevirtual 4 java/lang/Integer.intValue()I
   13 invokevirtual 6 java/lang/invoke/MethodHandle.invokeBasic(Ljava/lang/Object;I)I
   16 invokestatic 7 java/lang/Integer.valueOf(I)Ljava/lang/Integer;
   19 return1

> !j9method 0x0000FFFF841F1A40
J9Method at 0xffff841f1a40 {
  Fields for J9Method:
    0x0: U8* bytecodes = !j9x 0x0000FFFEC98C19F0 // "״N"
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000FFFF841F1F20 (flags = 0x0)
    0x10: void* methodRunAddress = !j9x 0x0000000000000018
    0x18: volatile void* extra = !j9x 0x0000FFFEEBFF11F8
}
Signature: java/lang/Integer.intValue()I !bytecodes 0x0000FFFF841F1A40
ROM Method: !j9rommethod 0x0000FFFEC98C19DC
Next Method: !j9method 0x0000FFFF841F1A60

> !j9rommethod 0x0000FFFEC98C19DC   | grep modifier
    0x8: U32 modifiers = 0x20050201 (537199105)

> !j9method 0x0000FFFF841B22F0
J9Method at 0xffff841b22f0 {
  Fields for J9Method:
    0x0: U8* bytecodes = !j9x 0x0000FFFEC985B7E4 // "       "
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000FFFF841B15C0 (flags = 0x0)
    0x10: void* methodRunAddress = !j9x 0x0000000000000086
    0x18: volatile void* extra = !j9x 0x0000000000000001
}
Signature: java/lang/invoke/MethodHandle.invokeBasic([Ljava/lang/Object;)Ljava/lang/Object; !bytecodes 0x0000FFFF841B22F0
ROM Method: !j9rommethod 0x0000FFFEC985B7D0
Next Method: !j9method 0x0000FFFF841B2310

> !j9rommethod 0x0000FFFEC985B7D0 | grep modifiers
    0x8: U32 modifiers = 0x20030190 (537067920) // J9AccNative is set for MethodHandle.invokeBasic
DanHeidinga commented 1 year ago

Doesn't look 19 specific Moving to 0.38

gacholio commented 1 year ago

I don't understand the explanation of why this doesn't work like any other INL. jitStackFrameFlags is managed entirely by the interpreter and I think it should stay that way.

babsingh commented 1 year ago

I don't understand the explanation of why this doesn't work like any other INL. jitStackFrameFlags is managed entirely by the interpreter and I think it should stay that way.

This INL has two code paths: one for direct calls from the JIT and other for interpreted mode. The below sequence causes the invokeBasic INL to falsely believe that it is directly invoked by the JIT when it should be interpreted.

Running interpreted code
i2jTransition
java/lang/Integer.intValue()I [JITed method]
j2iTransition // Sets J9_SSF_JIT_NATIVE_TRANSITION_FRAME
invokevirtual MethodHandle.invokeBasic // Falsely assumes a direct call from the JIT even though it is interpreted
gacholio commented 1 year ago

OK, I see what I missed in the reviews for the INLs. When the native transition flag is checked, it must also be cleared. I suggest adding a helper to fetch and clear the bit since the pattern is repeated many times. jitStackFrameFlags can be set to 0 rather than masking out the bit.

babsingh commented 1 year ago

We do clear jitStackFrameFlags (set it to 0) at the end of the invokeBasic and linkTo* INLs before calling j2iTransition. ++ @fengxue-IS to review my initial analysis https://github.com/eclipse-openj9/openj9/issues/16741#issuecomment-1447116352 in case the issue was misinterpreted.

gacholio commented 1 year ago

I assume you've left out some frames from your stack trace (I suppose they're not missing at the time). The j2i transition runs the INL which invokes a java method which then does the invokebasic?

babsingh commented 1 year ago

Yes. In the second frame (JIT'ed), the linkToStatic INL is invoked. The linkToStatic INL calls the method in the top frame and sets the J2I frame. The top frame is interpreted until the JIT'ed Integer.intValue method is invoked, which requires the I2J and J2I transitions. A J2I transition has happened just before the invokeBasic INL and after the JIT'ed Integer.intValue.

Below are top 4 frames in the Java stack:

> stackslots summary
[Top frame, J2I Frame] LambdaForm$NFI/0x000000007800f130.invoke_LI_I, !j9method 0x0000FFFE7800F778, Bytecode index = 13:
   -- invokevirtual 6 java/lang/invoke/MethodHandle.invokeBasic(Ljava/lang/Object;I)I
[JIT Frame] LambdaForm$DMH/0x000000008442a680.invokeStatic, !j9method 0x0000FFFF8442AEC8, Bytecode index = 11:
   -- invokestatic 9 java/lang/invoke/MethodHandle.linkToStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/invoke/MemberName;)Ljava/lang/Object;
[Bytecode Frame] LambdaForm$NamedFunction.invokeWithArguments, !j9method 0x0000FFFF8438A110, Bytecode index = 21:
   -- invokevirtual 59 java/lang/invoke/MethodHandle.invokeBasic(Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;
[Bytecode Frame] LambdaForm.interpretName, !j9method 0x0000FFFF8420EF98, Bytecode index = 119:
   -- invokevirtual 208 java/lang/invoke/LambdaForm$NamedFunction.invokeWithArguments([Ljava/lang/Object;)Ljava/lang/Object;

In the top J2I frame, bytecodes for LambdaForm$NFI/0x000000007800f130.invoke_LI_I are

> !bytecodes 0x0000FFFE7800F778
  Name: invoke_LI_I
  Signature: (Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;
  Access Flags (20080008): default static
  Internal Attribute Flags: @FrameIteratorSkip
  Max Stack: 4
  Argument Count: 2
  Temp Count: 0

    0 aload0
    1 aload1
    2 iconst0
    3 aaload
    4 aload1
    5 iconst1
    6 aaload
    7 checkcast 3 java/lang/Integer
   10 invokevirtual 4 java/lang/Integer.intValue()I // JIT'ed method
   13 invokevirtual 6 java/lang/invoke/MethodHandle.invokeBasic(Ljava/lang/Object;I)I
   16 invokestatic 7 java/lang/Integer.valueOf(I)Ljava/lang/Integer;
   19 return1
gacholio commented 1 year ago

I think it would be helpful to repost the sequence of events linearly like before.

babsingh commented 1 year ago

Simplified linear sequence of key events:

invokestatic MethodHandle.linkToStatic // from the JIT
j2iTransition // from the linkToStatic INL

Enter LambdaForm$NFI/0x000000007800f130.invoke_LI_I and execute bytecodes in interpreted code

i2jTransition
java/lang/Integer.intValue()I [JITed method]
j2iTransition // Sets J9_SSF_JIT_NATIVE_TRANSITION_FRAME because invokeBasic has J9AccNative set

invokevirtual MethodHandle.invokeBasic // Falsely assumes a direct call from the JIT even though it is interpreted
babsingh commented 1 year ago

@gacholio Does the above linear sequence highlight the problem mentioned in https://github.com/eclipse-openj9/openj9/issues/16741#issuecomment-1447116352?

gacholio commented 1 year ago

Some has been omitted (j2iTransition to get to the INL etc). I'll update tomorrow with what I suspect is the sequence and you can correct me.

0xdaryl commented 1 year ago

Hi... This issue was discussed in the context of the 0.38 release today. My understanding of the next steps here is to agree to some solution based on Babneet's proposal. From a JIT perspective, we'll implement what's necessary based on that design.

It is not clear how frequent this problem occurs given the information above, but the absence of other build failures commented here it sounds like it may be infrequent. Perhaps this issue should move out to 0.40.

pshipton commented 1 year ago

I don't see more than one failure reported.

gacholio commented 1 year ago

Is this the sequence of events?

invokestatic MethodHandle.linkToStatic // from the JIT
j2iTransition  // Sets J9_SSF_JIT_NATIVE_TRANSITION_FRAME because linkToStatic has J9AccNative set
linkToStatic INL // detects and clears J9_SSF_JIT_NATIVE_TRANSITION_FRAME
j2iTransition // from the linkToStatic INL
Enter LambdaForm$NFI/0x000000007800f130.invoke_LI_I and execute bytecodes in interpreted code
i2jTransition
java/lang/Integer.intValue()I [JITed method] // performs invokeBasic
j2iTransition // Sets J9_SSF_JIT_NATIVE_TRANSITION_FRAME because invokeBasic has J9AccNative set
invokeBasic INL // detects and clears J9_SSF_JIT_NATIVE_TRANSITION_FRAME
j2iTransition // from invokeBasic INL
Enter ??? and execute bytecodes in interpreted code
invokevirtual MethodHandle.invokeBasic // The bit should have been cleared by invokeBasic INL
gacholio commented 1 year ago

The error case (NPE) leaves the native transition bit set when it builds the resolve frame, which might lead to problems later on.

babsingh commented 1 year ago

re https://github.com/eclipse-openj9/openj9/issues/16741#issuecomment-1500451282:

gacholio commented 1 year ago

j2iTransition // Sets J9_SSF_JIT_NATIVE_TRANSITION_FRAME because MH.invokeBasic has J9AccNative set --- missing --- invokevirtual MethodHandle.invokeBasic // Falsely assumes a direct call from the JIT even though it is interpreted

The interesting part is still missing here - the j2iTransition to the invokeBasic INL that clears the bit.

babsingh commented 1 year ago

The interesting part is still missing here - the j2iTransition to the invokeBasic INL that clears the bit.

I don't believe there will be an j2iTransition for the invokeBasic INL since the parent method (LambdaForm$NFI/0x000000007800f130.invoke_LI_I) is interpreted. In this method, only Integer.intValue is JIT'ed.

gacholio commented 1 year ago

That makes no sense - there clearly is a j2iTransition for the INL:

j2iTransition // Sets J9_SSF_JIT_NATIVE_TRANSITION_FRAME because MH.invokeBasic has J9AccNative set

and that invokeBasic must invoke some interpreted method before:

invokevirtual MethodHandle.invokeBasic // Falsely assumes a direct call from the JIT even though it is interpreted
babsingh commented 1 year ago

@nbhuiyan @0xdaryl I had a discussion with GAC where it was pointed that J9_SSF_JIT_NATIVE_TRANSITION_FRAME is being correctly set for the invokeBasic INL in the j2iTransition. Currently, the JIT correctly sets tempSlot and floatTemp1 only if the invokeBasic or linkTo* INLs are invoked from a JIT'ed method. The JIT will need to cover all cases where these INLs are invoked after the j2iTransition:

nbhuiyan commented 1 year ago

J9_SSF_JIT_NATIVE_TRANSITION_FRAME is being correctly set for the invokeBasic INL in the j2iTransition

Yes, but the problem is not with this flag being set, but with the INL functions using this alone to determine whether their caller was JIT-ed code or interpreted. Currently, when INLs use this flag, it only answers whether there was a j2iTransition prior to entering the INL function, which can happen whether INLs are called from the JIT, or we just returned from a JIT-ed method call. This is why we discussed making use of a different available flag that is only ever exclusively set by the JIT-ed code prior to performing a call into the INLs to signal that we arrived there as a result of a call from JIT-ed code.

Can the JIT correctly set the tempSlot and floatTemp1 if the INLs are invoked after a j2iTransition but not from a JIT'ed method?

I don't see how that is possible.

Can the JIT store a special value in the tempSlot and floatTemp1 which will tell the INLs to use the interpreted path after a j2iTransition?

Not possible as well, unless you are suggesting that JIT-ed code must always set tempSlot and floatTemp1 at every single possible j2i transition points for every method that is ever JIT-ed. Furthermore, tempSlot and floatTemp1 are used for many other purposes, so we might also see many other issues show up as a result. A JIT compilation should not have to worry about the fact that the caller of the method being compiled would call an INL function right after that.

gacholio commented 1 year ago

All calls to INLs from jitted code must go through j2iTransition - the INLs are inlined functions that can not be called directly.

nbhuiyan commented 1 year ago

I realize now that my wording suggested that the INLs are callable directly from the JIT, but yes, I meant the JIT-code was calling the INLs via j2iTransition. The problem is around whether the INL's caller (whether or not via j2iTransition) can correctly be determined by the J9_SSF_JIT_NATIVE_TRANSITION_FRAME flag that is set by j2itransition whether JIT called the INL via j2iTransition or we just returned from calling a JIT-ed method (in this case, Interger.intValue) from an interpreted method (invoke_LI_I).

gacholio commented 1 year ago

We appear to be using this issue to discuss two completely unrelated things (jitStackFrameFlags usage and the calling convention for the polymorphic signature INLs called from the JIT).

gacholio commented 1 year ago

For jitStackFrameFlags, I think there's a fundamental misunderstanding about how it's used. This field should be 0 at all times except when j2iTransition is invoking a native. j2iTransition sets the field, and any stack frame built by the invoked native detects and clears it (the bit is moved into the specialFrameFlags field in the built stack frame).

The field is zeroed before transitioning from the interpreter to compiled code. I see the JIT is zeroing the field when doing a direct JNI callout, which I believe is unnecessary. If the compiled code has to zero the field, the VM has done something wrong.

gacholio commented 1 year ago

For the polymorphic INLs, the stack looks like:

arg
...
arg
MH

If the INL is called from the interpreter, we peek back at the bytecode which invoked the INL and get the arg count from the constant pool of the caller.

If the INL is called from compiled code, the VM must somehow be told the argument count. I can see two possible solutions for this:

1) Fill in a field in the thread as suggested. The JIT could match the name of the called method textually to handle the unresolved call case.

2) Again, detecting the INL by name, the JIT could duplicate or move the MH so the stack looked like:

MH
arg
...
arg

or

MH
arg
...
arg
MH

which allows the VM to find the MH to get the arg count. I can imagine this might cause problems with virtual thunks, so setting the slot might be the better solution.

gacholio commented 1 year ago

Looks like only invokeBasic has this issue - the other have the MemberName as the last arg.

I suggest changing:

        bool fromJIT = J9_ARE_ANY_BITS_SET(_currentThread->jitStackFrameFlags, J9_SSF_JIT_NATIVE_TRANSITION_FRAME);

to:

        bool fromJIT = J9_ARE_ANY_BITS_SET(jitStackFrameFlags(REGISTER_ARGS, 0), J9_SSF_JIT_NATIVE_TRANSITION_FRAME);

or:

        bool fromJIT = (0 != jitStackFrameFlags(REGISTER_ARGS, 0));

and removing the explicit zeroing from the new INLs.

babsingh commented 1 year ago

Created https://github.com/eclipse-openj9/openj9/pull/17244 to address the jitStackFrameFlags concern, which is mentioned in https://github.com/eclipse-openj9/openj9/issues/16741#issuecomment-1518097463.

gacholio commented 1 year ago

For the arg count issue, I don't think solution 2 is much good - it would require some special casing in the stack walker for the unresolved case.

nbhuiyan commented 1 year ago

Based on discussions about this problem over a call, it was pointed out to us by @gacholio that we do not go through j2itransition function upon returning from a compiled method into the interpreted method. Our initial investigations led us to incorrectly assume that it was because of going through j2itransition that the jitStackFrameFlags was incorrectly being set to indicate to the INL function that the call was from compiled code.

Therefore, I will be continuing to work with @babsingh to create a reliably reproducible test failure, and also identify where and how the jitStackFrameFlags get modified.

I believe we can move this issue out of 0.40, given that it is still a work in progress and the problem is rarely seen.

gacholio commented 1 year ago

This may be closeable now that #17732 has been merged.

babsingh commented 1 year ago

Closing since https://github.com/eclipse-openj9/openj9/pull/17732 is merged. It can be reopened if the failure reoccurs. See https://github.com/eclipse-openj9/openj9/pull/17732#issuecomment-1630037418 for the grinder details that was used to verify the fix.