eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

[FFI/Jtreg] Invalid JIT return address detected in upcall test suites #16387

Closed ChengJin01 closed 1 year ago

ChengJin01 commented 1 year ago

The failing tests were detected in Grinders at https://openj9-jenkins.osuosl.org/job/Grinder/1562/consoleText (zLinux) with the dumps at https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder/1562/openjdk_test_output.tar.gz

[2022-11-29T23:36:26.118Z] test TestUpcallAsync.testUpcallsAsync(969, "f1_V_IFS_PI", VOID, [INT, FLOAT, STRUCT], [POINTER, INT]): success
[2022-11-29T23:36:26.118Z] STDERR:
[2022-11-29T23:36:26.118Z] 
[2022-11-29T23:36:26.118Z] 
[2022-11-29T23:36:26.118Z] *** Invalid JIT return address 00000000000BCF00 in 00000000001F1200
[2022-11-29T23:36:26.118Z] 
[2022-11-29T23:36:26.118Z] 23:35:39.251 0x1f0f00    j9vm.249    *   ** ASSERTION FAILED ** ''
at /home/jenkins/workspace/Build_JDK19_s390x_linux_Nightly/openj9/runtime/vm/swalk.c:1632: ((0 ))

and macOS/X86_64 at https://openj9-jenkins.osuosl.org/job/Grinder/1561/consoleText with the dumps at https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder/1561/openjdk_test_output.tar.gz

[2022-11-29T23:36:49.136Z] test TestUpcallAsync.testUpcallsAsync(11611, "f19_S_SPS_PD", NON_VOID, [STRUCT, POINTER, STRUCT], [POINTER, DOUBLE]): success
[2022-11-29T23:36:49.136Z] STDERR:
[2022-11-29T23:36:49.136Z] 
[2022-11-29T23:36:49.136Z] 
[2022-11-29T23:36:49.136Z] *** Invalid JIT return address 0000000012010E00 in 00000000121C2400
[2022-11-29T23:36:49.136Z] 
[2022-11-29T23:36:49.136Z] 23:35:55.635 0x121c2100    j9vm.249    *   ** ASSERTION FAILED ** 
at /Users/jenkins/workspace/Build_JDK19_x86-64_mac_Nightly/openj9/runtime/vm/swalk.c:1632: ((0 ))

FYI: @tajila, @pshipton, @0xdaryl, @zl-wang

ChengJin01 commented 1 year ago

I notice there is another similar issue with invalid JIT return address at https://github.com/eclipse-openj9/openj9/issues/16249 specific to Project Loom but I am not quite sure whether they belong to the same issue which should be addressed together.

ChengJin01 commented 1 year ago

@tajila, is there special stackwalk operation related to upcall we should do from VM perspective? otherwise, JIT might need to take it over if they are unaware of the case intended for upcall.

tajila commented 1 year ago

The failure w.r.t loom is because the JIT didnt scan the continuation stacks before freeing the code cache. If there are no virtualthreads in this test then its not a factor.

tajila commented 1 year ago

There might be a similar situation where, the target method handle in the thunk points to a method that is free'd by the jit since I think think the JIT scans those.

@0xdaryl is that scenario possible?

0xdaryl commented 1 year ago

I don't believe we cache anything other than a J9Method for MethodHandles, and those should be properly updated once a method is recompiled.

@jdmpapin : could you investigate this problem from a JIT perspective please?

ChengJin01 commented 1 year ago

The same issue was spotted on Windows at https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder/1582/openjdk_test_output.tar.gz by DDR in https://github.com/eclipse-openj9/openj9/issues/16408.

> !threads
        !stack 0x8bb901da00     !j9vmthread 0x8bb901da00        !j9thread 0x8bb82e13f0  tid 0x10f0 (4336) // (main)
...
   -----> !stack 0x8bd6abbb00     !j9vmthread 0x8bd6abbb00        !j9thread 0x8bd69f1498  tid 0x1358 (4952) // (MainThread)

> !stack 0x8bd6abbb00
<8bd6abbb00>    !j9method 0x0000008BC25D9330   java/lang/invoke/MethodType.basicType()Ljava/lang/invoke/MethodType;
Dec 08, 2022 6:55:20 PM com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptDataException thrown walking stack. walkThread = 0x0000008BD6ABBB00
com.ibm.j9ddr.AddressedCorruptDataException: Invalid JIT return address <--------------
        at com.ibm.j9ddr.vm29.j9.stackwalker.JITStackWalker$JITStackWalker_29_V0.jitWalkStackFrames(JITStackWalker.java:287)
        at com.ibm.j9ddr.vm29.j9.stackwalker.JITStackWalker.jitWalkStackFrames(JITStackWalker.java:101)
        at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:486)
        at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker.walkStackFrames(StackWalker.java:103)
        at com.ibm.j9ddr.vm29.tools.ddrinteractive.commands.StackWalkCommand.run(StackWalkCommand.java:139)
zl-wang commented 1 year ago

do these failures have anything to do with FFI upcalls? if yes, return-to-upcall-thunk address is visible to jit stackwalk in anyway? i presumed that address is not recorded anywhere in java stack, but you should double-check. just FYI ...

ChengJin01 commented 1 year ago

do these failures have anything to do with FFI upcalls? if yes, return-to-upcall-thunk address is visible to jit stackwalk in anyway? i presumed that address is not recorded anywhere in java stack, but you should double-check. just FYI ...

Yes. all these failures occurred in the upcall test suites but the DDR/dumps didn't offer sufficient information to confirm anything else except the corrupted stack traces.

In addition, we just fixed an upcall related bug in the dispatcher at https://github.com/eclipse-openj9/openj9/pull/16427 to resolve the problem with invalid object detected by GC on the java stack in stackwalk. So this is likely the reason why Invalid JIT return address was captured in JIT stackwalk if the upcall MH was messed up on the java stack prior to the call-in before the fix.

We will verify the failing test suites with Grinders to see whether the problem still exists after the merged fix. If it is gone, we can safely close the issue as resolved; otherwise, they might need further investigation from the JIT perspective.

ChengJin01 commented 1 year ago

Close this issue as the problem didn't show up in the Grinders (x100 on zLinux) with the latest nightly build at https://openj9-jenkins.osuosl.org/job/Grinder/1660/ and other platforms, assuming the problem was fixed by https://github.com/eclipse-openj9/openj9/pull/16427.