eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

JDK21 serviceability_jvmti_j9_0_FAILED serviceability/jvmti/vthread/BreakpointInYieldTest/BreakpointInYieldTest.java Segmentation error vmState=0x0002000f #18088

Closed JasonFengJ9 closed 11 months ago

JasonFengJ9 commented 1 year ago

Failure link

From an internal build(osxrt1):

03:27:31  openjdk version "21-internal" 2023-09-19
03:27:31  OpenJDK Runtime Environment (build 21-internal-adhoc.jenkins.BuildJDK21x86-64macPersonal)
03:27:31  Eclipse OpenJ9 VM (build master-7599bde8a13, JRE 21 Mac OS X amd64-64-Bit Compressed References 20230906_50 (JIT enabled, AOT enabled)
03:27:31  OpenJ9   - 7599bde8a13
03:27:31  OMR      - 873ac5d377a
03:27:31  JCL      - 154f45ddce4 based on jdk-21+35)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

03:29:53  variation: Mode150
03:29:53  JVM_OPTIONS:  -XX:+UseCompressedOops 

03:32:59  TEST: serviceability/jvmti/vthread/BreakpointInYieldTest/BreakpointInYieldTest.java

03:32:59  STDERR:
03:32:59  Unhandled exception
03:32:59  Type=Segmentation error vmState=0x0002000f
03:32:59  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000000
03:32:59  Handler1=0000000007C37E60 Handler2=0000000007F42910
03:32:59  RDI=000000003949B6F4 RSI=00007000108A8300 RAX=0000000000000018 RBX=0000000000000018
03:32:59  RCX=FFFF802E6284C5C4 RDX=FFFF802E9BCE7CB8 R8=00007000108A82FC R9=00007000108A82F8
03:32:59  R10=0000000000000018 R11=000000003949B70C R12=00007000108A8290 R13=0000000000000007
03:32:59  R14=00007000108A8360 R15=0000000000000000
03:32:59  RIP=0000000007E17EA2 GS=0000 FS=0000 RSP=00007000108A8218
03:32:59  RFlags=0000000000010282 CS=002B RBP=00007000108A8240 ERR=D9E2700000000000
03:32:59  TRAPNO=000000000000000D CPU=7000000000000000 FAULTVADDR=00007FD1D9E27000
03:32:59  XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM1 0000000000000468 (f: 1128.000000, d: 5.573060e-321)
03:32:59  XMM2 69767265732f6b72 (f: 1932487552.000000, d: 1.073873e+200)
03:32:59  XMM3 2f69746d766a2f79 (f: 1986670464.000000, d: 2.683495e-80)
03:32:59  XMM4 696f706b61657242 (f: 1634038400.000000, d: 7.520345e+199)
03:32:59  XMM5 2f64616572687476 (f: 1919448192.000000, d: 2.148548e-80)
03:32:59  XMM6 6c6569596e49746e (f: 1850307712.000000, d: 1.441632e+214)
03:32:59  XMM7 746e696f706b6165 (f: 1886085504.000000, d: 6.967698e+252)
03:32:59  XMM8 616d2f642e747365 (f: 779383680.000000, d: 2.051584e+161)
03:32:59  XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
03:32:59  Module=/Users/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_x86-64_mac_Personal/openjdkbinary/j2sdk-image/lib/default/libj9vm29.dylib
03:32:59  Module_base_address=0000000007C00000 Symbol=mapLocalSet
03:32:59  Symbol_address=0000000007E17E10
03:32:59  Target=2_90_20230906_50 (Mac OS X 10.15.7)
03:32:59  CPU=amd64 (8 logical CPUs) (0x300000000 RAM)
03:32:59  ----------- Stack Backtrace -----------
03:32:59  mapLocalSet+0x93 (0x0000000007E17EA3 [libj9vm29.dylib+0x217ea3])
03:32:59  j9localmap_LocalBitsForPC+0x5fb (0x0000000007E17C7B [libj9vm29.dylib+0x217c7b])
03:32:59  walkBytecodeFrameSlots+0x178 (0x0000000007C7AD98 [libj9vm29.dylib+0x7ad98])
03:32:59  walkStackFrames+0x1136 (0x0000000007C7A6F6 [libj9vm29.dylib+0x7a6f6])
03:32:59  walkContinuationStackFrames+0x19d (0x0000000007C9049D [libj9vm29.dylib+0x9049d])
03:32:59  _ZN28GC_VMThreadStackSlotIterator21scanContinuationSlotsEP10J9VMThreadP8J9ObjectPvPFvP8J9JavaVMPS3_S4_P16J9StackWalkStatePKvEbb+0xbf (0x0000000009BBB26F [libj9gc29.dylib+0xe726f])
03:32:59  _ZN20MM_ScavengerDelegate27scanContinuationNativeSlotsEP22MM_EnvironmentStandardP8J9Object21MM_ScavengeScanReasonb+0xc9 (0x0000000009B96C29 [libj9gc29.dylib+0xc2c29])
03:32:59  _ZN20MM_ScavengerDelegate16getObjectScannerEP22MM_EnvironmentStandardP8J9ObjectPvm21MM_ScavengeScanReasonPb+0x2ed (0x0000000009B96F4D [libj9gc29.dylib+0xc2f4d])
03:32:59  _ZN12MM_Scavenger26incrementalScanCacheBySlotEP22MM_EnvironmentStandardP24MM_CopyScanCacheStandard+0x5d6 (0x0000000009B64976 [libj9gc29.dylib+0x90976])
03:32:59  _ZN12MM_Scavenger12completeScanEP22MM_EnvironmentStandard+0x1a6 (0x0000000009B65396 [libj9gc29.dylib+0x91396])
03:32:59  _ZN12MM_Scavenger24workThreadGarbageCollectEP22MM_EnvironmentStandard+0x292 (0x0000000009B65762 [libj9gc29.dylib+0x91762])
03:32:59  _ZN21MM_ParallelDispatcher16workerEntryPointEP18MM_EnvironmentBase+0x77 (0x0000000009B09607 [libj9gc29.dylib+0x35607])
03:32:59  _Z23dispatcher_thread_proc2P14OMRPortLibraryPv+0xf6 (0x0000000009B094F6 [libj9gc29.dylib+0x354f6])
03:32:59  omrsig_protect+0x392 (0x0000000007F41402 [libj9prt29.dylib+0x21402])
03:32:59  dispatcher_thread_proc+0x42 (0x0000000009B09582 [libj9gc29.dylib+0x35582])
03:32:59  thread_wrapper+0x13a (0x0000000006BB06BA [libj9thr29.dylib+0xa6ba])
03:32:59  _pthread_start+0x94 (0x00007FFF6C384109 [libsystem_pthread.dylib+0x6109])
03:32:59  ---------------------------------------
03:32:59  JVMDUMP039I Processing dump event "gpf", detail "" at 2023/09/07 03:31:44 - please wait.

03:33:09  serviceability_jvmti_j9_0_FAILED

50x serviceability_jvmti_j9_0 internal grinder - reproduced 7/50

This seems similar to

FYI @babsingh

JasonFengJ9 commented 1 year ago

JDK21 aarch64_mac/ milestone 0(macaarch64rt8)

[2023-09-22T18:52:07.426Z] variation: Mode650
[2023-09-22T18:52:07.426Z] JVM_OPTIONS:  -XX:-UseCompressedOops 

[2023-09-22T18:53:37.732Z] TEST: serviceability/jvmti/vthread/BreakpointInYieldTest/BreakpointInYieldTest.java

[2023-09-22T18:53:37.733Z] STDERR:
[2023-09-22T18:53:37.733Z] Unhandled exception
[2023-09-22T18:53:37.733Z] Type=Segmentation error vmState=0x0002000f
[2023-09-22T18:53:37.733Z] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
[2023-09-22T18:53:37.733Z] Handler1=0000000104B41094 Handler2=00000001049F51D8 InaccessibleAddress=FFFE8006AAA6A180
[2023-09-22T18:53:37.733Z] x0=FFFFA0014F9EF438 x1=000000016C2AD0A0 x2=FFFFA0014F9EF438 x3=0000000000000000
[2023-09-22T18:53:37.733Z] x4=000000016C2AD09C x5=000000016C2AD098 x6=000000016C2AD094 x7=0000000104BF7748
[2023-09-22T18:53:37.733Z] x8=000000016C2AD100 x9=00000001501177C8 x10=00000001501177E0 x11=0000000104C679CC
[2023-09-22T18:53:37.733Z] x12=0000000000000001 x13=0000000104C682A3 x14=0000000104C67BCC x15=0000000000000007
[2023-09-22T18:53:37.733Z] x16=000000016C2AD100 x17=FFFFA0029FB06C00 x18=0000000000000000 x19=00000000FFFFFFFF
[2023-09-22T18:53:37.733Z] x20=000000016C2AD95C x21=000000010491DEE0 x22=0000000000000000 x23=00000001501177B4
[2023-09-22T18:53:37.733Z] x24=FFFFA0014F9EF438 x25=0000000104BF76A4 x26=000000016C2AD0A0 x27=0000000000000000
[2023-09-22T18:53:37.733Z] x28=0000000000000003 x29(FP)=000000016C2AD900 x30(LR)=0000000104BF72C0 x31(SP)=000000016C2ACFD0
[2023-09-22T18:53:37.733Z] PC=0000000104BF74AC SP=000000016C2ACFD0
[2023-09-22T18:53:37.733Z] v0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v1 00000002801c8778 (f: 2149353216.000000, d: 5.305913e-314)
[2023-09-22T18:53:37.733Z] v2 000001be0000013e (f: 318.000000, d: 9.464101e-312)
[2023-09-22T18:53:37.733Z] v3 000001be000001be (f: 446.000000, d: 9.464101e-312)
[2023-09-22T18:53:37.733Z] v4 00000000000001be (f: 446.000000, d: 2.203533e-321)
[2023-09-22T18:53:37.733Z] v5 0000013e0000013e (f: 318.000000, d: 6.747947e-312)
[2023-09-22T18:53:37.733Z] v6 000000000000013e (f: 318.000000, d: 1.571129e-321)
[2023-09-22T18:53:37.733Z] v7 0000013e00000000 (f: 0.000000, d: 6.747947e-312)
[2023-09-22T18:53:37.733Z] v8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v16 bfd0000000000000 (f: 0.000000, d: -2.500000e-01)
[2023-09-22T18:53:37.733Z] v17 3fd57028ca0c5555 (f: 3389805824.000000, d: 3.349707e-01)
[2023-09-22T18:53:37.733Z] v18 bf7aea0b0c8a2ba6 (f: 210381728.000000, d: -6.570857e-03)
[2023-09-22T18:53:37.733Z] v19 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
[2023-09-22T18:53:37.733Z] v20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] v31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-09-22T18:53:37.733Z] Module=/Users/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_aarch64_mac/openjdkbinary/j2sdk-image/Contents/Home/lib/default/libj9vm29.dylib
[2023-09-22T18:53:37.733Z] Module_base_address=0000000104B1C000 Symbol=mapLocalSet
[2023-09-22T18:53:37.733Z] Symbol_address=0000000104BF7420
[2023-09-22T18:53:37.733Z] Target=2_90_20230919_43 (Mac OS X 13.0)
[2023-09-22T18:53:37.733Z] CPU=aarch64 (8 logical CPUs) (0x400000000 RAM)
[2023-09-22T18:53:37.733Z] ----------- Stack Backtrace -----------
[2023-09-22T18:53:37.733Z] ---------------------------------------
[2023-09-22T18:53:37.733Z] JVMDUMP039I Processing dump event "gpf", detail "" at 2023/09/22 14:53:11 - please wait.

[2023-09-22T18:53:56.143Z] serviceability_jvmti_j9_1_FAILED
tajila commented 1 year ago

@gacholio can you please take a look at this test_output

gacholio commented 1 year ago

The tar file appears to be corrupt

j9build@736bb006f300:DOCKER-IMAGE $ tar xzf openjdk_test_output.tar.gz 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
j9build@736bb006f300:DOCKER-IMAGE $ gunzip openjdk_test_output.tar.gz 
j9build@736bb006f300:DOCKER-IMAGE $ ls
openjdk_test_output.tar
j9build@736bb006f300:DOCKER-IMAGE $ tar xvf openjdk_test_output.tar 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
gacholio commented 1 year ago

Looks like http download corrupts it - fetching with curl now.

gacholio commented 1 year ago

A link to a closely matching JDK11 xa64 build would be helpful (for DDR). The SDK which produced the cores may also help, though I've never been able to run openj9 on my mac (millions of permission errors).

tajila commented 1 year ago

I think this is a better link https://na.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Test_openjdk21_j9_extended.openjdk_aarch64_mac/4/openjdk_test_output.tar.gz

Link to aarch builds https://na-public.artifactory.swg-devops.com/ui/native/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/build-scripts/jobs/jdk21/jdk21-mac-aarch64-openj9/43/

Link to the equivalent xa64 https://na-public.artifactory.swg-devops.com/ui/native/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/build-scripts/jobs/jdk21/jdk21-linux-x64-openj9/43/

gacholio commented 1 year ago

I think this was actually an x86 mac build.

tajila commented 1 year ago

I think this was actually an x86 mac build.

Are you talking about the original failure? or the one Jason posted 2 weeks ago?

gacholio commented 1 year ago

I was looking at the start of the PR description. If the cores I have are from amac, the SDK won't help me anyway. I'll see if I can figure anything out from DDR. I also notice that no native stack traces appear in the javacore, which is unhelpful.

tajila commented 1 year ago

Here is the link for the core in the original failure: https://na-public.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/hyc-runtimes-jenkins.swg-devops.com/Test_openjdk21_j9_extended.openjdk_x86-64_mac_Personal/27/openjdk_test_output.tar.gz

Here is the link to the mac x86 build: https://na-public.artifactory.swg-devops.com/ui/native/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Build_JDK21_x86-64_mac_Personal/50/

I think the original failure was found in a personal build @JasonFengJ9 can confirm.

But using DDR on a newer JDK on xa64 should work

gacholio commented 1 year ago

It's a crash in GC, so the native stacks probably aren't going to be very informative anyway.

JasonFengJ9 commented 1 year ago

I think the original failure was found in a personal build @JasonFengJ9 can confirm.

@tajila yes, the failure was from a personal build before JDK21 nightly/weekly builds run regularly. All codes were current with the main branches.

gacholio commented 1 year ago

The mac x86 link above doesn't work (it's not even a link to a file!). Even signed in, it just presents me with thousands of unrelated artifacts.

pshipton commented 1 year ago

You need to open it twice, it doesn't work the first time.

gacholio commented 1 year ago

That worked, but what I really want is the linux x86-64 build. This looks like the right place to look:

https://na-public.artifactory.swg-devops.com/ui/native/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Build_JDK11_x86-64_linux_Nightly/

but it hasn't been updated in years. Where should I be looking for the latest builds?

pshipton commented 1 year ago

The latest nightly build is https://na-public.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/build-scripts/jobs/jdk11u/jdk11u-linux-x64-openj9/877/ibm-semeru-open-jdk_x64_linux_JDK11U_2023-10-10-22-01.tar.gz and you can truncate that to find older nightly builds. However they may not all be head stream builds.

pshipton commented 1 year ago

Or jdk21 https://na-public.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/9/ibm-semeru-open-jdk_x64_linux_JDK21U_2023-10-10-23-31.tar.gz

gacholio commented 1 year ago

Thanks, the mac x86 build is also working for me, now trying to get the original core downloaded (keeps failing half way through).

gacholio commented 1 year ago

I have the original failure. Not surprisingly, the stack being walked appears to be in an unmounted continuation.

gacholio commented 1 year ago

The DDR stack extensions don't appear to work on these stacks:

> !stackslots 0x00007000108A8CA0
Oct 11, 2023 2:33:25 P.M. com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
SEVERE: CDE thrown extracting initial stack walk state. walkThread = 0x00007000108A8CA0
com.ibm.j9ddr.NullPointerDereference: Memory Fault reading 0x00000000 : 
    at com.ibm.j9ddr.vm29.pointer.AbstractPointer.getLongAtOffset(AbstractPointer.java:456)
    at com.ibm.j9ddr.vm29.pointer.generated.J9ThreadPointer.tid(Unknown Source)
    at com.ibm.j9ddr.vm29.pointer.helper.J9ThreadHelper.getOSThread(J9ThreadHelper.java:60)
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:171)
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker.walkStackFrames(StackWalker.java:99)

Why DDR needs to know the OS thread ID is a mystery to me, but it's clear we don't fill that in for the stack-allocated threads used to drive the stack walker for unmounted continuations.

gacholio commented 1 year ago

As the continuation is unmounted, the only way to find the failing stack is to examine every continuation using the new extension.

@babsingh Is there a command to list all of the continuations? I don't see anything obvious in the DDR help.

babsingh commented 1 year ago

@babsingh Is there a command to list all of the continuations? I don't see anything obvious in the DDR help.

It will show in j9help.

> !j9help | grep vthread
vthreads                                       Lists virtual threads

> !j9help | grep contin
continuationstack         <continuation>       Walks the Java stack for <continuation>
continuationstackslots    <continuation>       Walks the Java stack (including objects) for <continuation>

Example:
     !vthreads
 Example output:
     !continuationstack 0x00007fe78c0f9600 !j9vmcontinuation 0x00007fe78c0f9600 !j9object 0x0000000706401588 (Continuation) !j9object 0x0000000706400FB0 (VThread) - name1
     !continuationstack 0x00007fe78c23aa80 !j9vmcontinuation 0x00007fe78c23aa80 !j9object 0x0000000706424F90 (Continuation) !j9object 0x0000000706424EF0 (VThread) - name2
     !continuationstack 0x00007fe78c244ac0 !j9vmcontinuation 0x00007fe78c244ac0 !j9object 0x00000007064250D8 (Continuation) !j9object 0x0000000706425038 (VThread) - name3
gacholio commented 1 year ago

!vthreads provides no output at all in this core

> !vthreads
> 
babsingh commented 1 year ago

These cmds are only available in jdmpview for JDK21+. @fengxue-IS, fyi, if there is an actual bug.

gacholio commented 1 year ago
openjdk version "21-internal" 2023-09-19
OpenJDK Runtime Environment (build 21-internal-adhoc.jenkins.BuildJDK21x86-64macPersonal)
Eclipse OpenJ9 VM (build master-7599bde8a13, JRE 21 Mac OS X amd64-64-Bit Compressed References 20230906_50 (JIT enabled, AOT enabled)
OpenJ9   - 7599bde8a13
OMR      - 873ac5d377a
JCL      - 154f45ddce4 based on jdk-21+35)

This is my JDK, and the vthreads command is clearly there, but does nothing.

gacholio commented 1 year ago

Latest nightly build provides the same nothing.

babsingh commented 1 year ago

This is my JDK, and the vthreads command is clearly there, but does nothing.

vthreads iterates through the global linked list of continuation object lists, which is maintained by the GC. If vthreads doesn't show anything, then the list is probably empty and GC has collected the continuation objects. In https://github.com/eclipse-openj9/openj9/issues/18088#issue-1886046290, the native stack shows that the GC is scanning the objects and encounters a dead continuation. @gacholio Can you confirm if this statement is true; is the unmounted continuation mentioned in https://github.com/eclipse-openj9/openj9/issues/18088#issuecomment-1758308997 dead? DDR cmd: !isobjectalive <address>. ++GC team @LinHu2016 @amicic @dmitripivkine for more insights.

class MM_GCExtensions : public MM_GCExtensionsBase {
private:
    ...
    MM_ContinuationObjectList* continuationObjectLists; /**< The global linked list of continuation object lists. */

https://github.com/eclipse-openj9/openj9/blob/676b9a455892f883f7ecd78f53dfae50c5660646/debugtools/DDR_VM/src/com/ibm/j9ddr/vm29/tools/ddrinteractive/VirtualThreadsCommand.java#L108-L126

gacholio commented 1 year ago

I say the continuation is unmounted because we're clearly in the fake thread/ELS case. I have no idea which continuation it is.

If the GC is scanning a dead continuation, then this should be looked at by the GC team.

gacholio commented 1 year ago

The crash is in the local mapper. The method being mapped is:

> !bytecodes 0x13725790
  Name: yield
  Signature: (Ljdk/internal/vm/ContinuationScope;)Z
  Access Flags (40009): public static 
  Internal Attribute Flags:
  Max Stack: 2
  Argument Count: 1
  Temp Count: 2

    0 breakpoint 
    1 dconst0 
    2 nop 
    3 invokeinterface2 
    5 invokeinterface 42 jdk/internal/access/JavaLangAccess.currentCarrierThread()Ljava/lang/Thread;
    8 astore1 
    9 getstatic 14 jdk/internal/vm/Continuation.JLA Ljdk/internal/access/JavaLangAccess;
   12 aload1 
   13 invokeinterface2 
   15 invokeinterface 43 jdk/internal/access/JavaLangAccess.getContinuation(Ljava/lang/Thread;)Ljdk/internal/vm/Continuation;
   18 astore2 
   19 aload2 
   20 invokespecial 47 jdk/internal/vm/Continuation.yield0()Z
   23 returnZ 

If for some reason the mapper is being presented with the breakpointed method, that would explain the crash. This should not be the case: https://github.com/eclipse-openj9/openj9/blob/676b9a455892f883f7ecd78f53dfae50c5660646/runtime/vm/swalk.c#L873-L881

gacholio commented 1 year ago

@LinHu2016 Please have a look at this to see why we're scanning a supposedly dead continuation. The core I'm looking at is:

https://na-public.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/hyc-runtimes-jenkins.swg-devops.com/Test_openjdk21_j9_extended.openjdk_x86-64_mac_Personal/27/openjdk_test_output.tar.gz

gacholio commented 1 year ago

@fengxue-IS The other interesting point is what ROM method is being passed to the local mapper. It should be the original (i.e. unbreakpointed), not the copy (breakpointed).

dmitripivkine commented 1 year ago

@gacholio There are two cores in this file failed in BreakpointInYieldTest. Which one I should look at? Would you please provide address of continuation object you suspect dead and should not be scanned?

dmitripivkine commented 1 year ago

for core.20230907.033504.84295.0001.dmp: there are three continuation objects in the list !mm_continuationobjectlist 0x00007FBCCA622E50: !j9object 0x35dcb4e0 !j9object 0x35dcb460 !j9object 0x35dbc478 all these objects are just forwarded, so alive

for core.20230907.033144.83304.0001.dmp: there are three continuation objects in the list !mm_continuationobjectlist 0x00007FD1D6C1D050: !j9object 0xffe2a768 !j9object 0xffe2a710 !j9object 0xffe095f0 all these objects are just forwarded, so alive

If we are talking about one of this objects I can check how it was discovered alive.

!vthreads command shows nothing because in both cases there is only one non-empty sublist and it is under processing at the moment. I think DDR code can be improved to show not only result for _head but also for temporary head _priorHead.

LinHu2016 commented 1 year ago

"dead" continuation object == last unmounted continuation object, its vmRef (J9VMContinuation *)should be null and state & 0x2(J9_GC_CONTINUATION_STATE_FINISHED) should not be 0, in this case GC can not scan it (because J9VMContinuation == null), @dmitripivkine Could you please check vmRef and state fields for those continuation Objects? there is a case that GC happens during last unmount but before swap java stack, but in this special case the continuation should still be treated as mounted continuation, which we do not scan it during heap scan.

dmitripivkine commented 1 year ago

for first core:

> !j9object 0x35dcb4e0
!J9Object 0x0000000035DCB4E0 {
    struct J9Class* clazz = !j9class 0x7FBCCE7B8F00 // java/lang/VirtualThread$VThreadContinuation
    Object flags = 0x00000010;
    J lockword = 0x0000000000000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x00007FBCCB054D00 (offset = 8) (jdk/internal/vm/Continuation)
    Ljava/lang/Thread; vthread = !fj9object 0x35d5d580 (offset = 32) (jdk/internal/vm/Continuation)
    [Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0x35d29988 (offset = 48) (jdk/internal/vm/Continuation)
    Ljava/lang/Runnable; runnable = !fj9object 0x35d5d6d8 (offset = 56) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 64) (jdk/internal/vm/Continuation)
    J state = 0x0000000000000001 (offset = 16) (jdk/internal/vm/Continuation)
    Z isAccessible = 0x00000001 (offset = 72) (jdk/internal/vm/Continuation)
    J continuationLink = 0x0000000035D5D460 (offset = 24) (jdk/internal/vm/Continuation) <hidden>
}

> !j9object 0x0000000035dcb460
!J9Object 0x0000000035DCB460 {
    struct J9Class* clazz = !j9class 0x7FBCCE7B8F00 // java/lang/VirtualThread$VThreadContinuation
    Object flags = 0x00000010;
    J lockword = 0x0000000000000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x00007FBCCA659560 (offset = 8) (jdk/internal/vm/Continuation)
    Ljava/lang/Thread; vthread = !fj9object 0x35d5d360 (offset = 32) (jdk/internal/vm/Continuation)
    [Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0x35d29988 (offset = 48) (jdk/internal/vm/Continuation)
    Ljava/lang/Runnable; runnable = !fj9object 0x35d5d4b8 (offset = 56) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 64) (jdk/internal/vm/Continuation)
    J state = 0x0000000000000001 (offset = 16) (jdk/internal/vm/Continuation)
    Z isAccessible = 0x00000001 (offset = 72) (jdk/internal/vm/Continuation)
    J continuationLink = 0x0000000035D517C8 (offset = 24) (jdk/internal/vm/Continuation) <hidden>
}

> !j9object 0x0000000035dbc478
!J9Object 0x0000000035DBC478 {
    struct J9Class* clazz = !j9class 0x7FBCCE7B8F00 // java/lang/VirtualThread$VThreadContinuation
    Object flags = 0x00000010;
    J lockword = 0x0000000000000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x00007FBCCB054E40 (offset = 8) (jdk/internal/vm/Continuation)
    Ljava/lang/Thread; vthread = !fj9object 0x35dbbc60 (offset = 32) (jdk/internal/vm/Continuation)
    [Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 40) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0x35db2b90 (offset = 48) (jdk/internal/vm/Continuation)
    Ljava/lang/Runnable; runnable = !fj9object 0x35dbd190 (offset = 56) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 64) (jdk/internal/vm/Continuation)
    J state = 0x00007FBCCB849B01 (offset = 16) (jdk/internal/vm/Continuation)
    Z isAccessible = 0x00000000 (offset = 72) (jdk/internal/vm/Continuation)
    J continuationLink = 0x0000000000000000 (offset = 24) (jdk/internal/vm/Continuation) <hidden>
}

for second core:

> !j9object 0x00000000ffe2a768
!J9Object 0x00000000FFE2A768 {
    struct J9Class* clazz = !j9class 0x138D9900 // java/lang/VirtualThread$VThreadContinuation
    Object flags = 0x00000010;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x00007FD1D6C50A10 (offset = 4) (jdk/internal/vm/Continuation)
    Ljava/lang/Thread; vthread = !fj9object 0xfffe5148 (offset = 20) (jdk/internal/vm/Continuation)
    [Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 24) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xfffc14d0 (offset = 28) (jdk/internal/vm/Continuation)
    Ljava/lang/Runnable; runnable = !fj9object 0xfffe5228 (offset = 32) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 36) (jdk/internal/vm/Continuation)
    J state = 0x0000000000000001 (offset = 12) (jdk/internal/vm/Continuation)
    Z isAccessible = 0x00000001 (offset = 40) (jdk/internal/vm/Continuation)
    I continuationLink = 0xFFFE50A0 (offset = 44) (jdk/internal/vm/Continuation) <hidden>
}
> !j9object 0x00000000ffe2a710
!J9Object 0x00000000FFE2A710 {
    struct J9Class* clazz = !j9class 0x138D9900 // java/lang/VirtualThread$VThreadContinuation
    Object flags = 0x00000010;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x00007FD1D5E1F260 (offset = 4) (jdk/internal/vm/Continuation)
    Ljava/lang/Thread; vthread = !fj9object 0xfffe4ff8 (offset = 20) (jdk/internal/vm/Continuation)
    [Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 24) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xfffc14d0 (offset = 28) (jdk/internal/vm/Continuation)
    Ljava/lang/Runnable; runnable = !fj9object 0xfffe50d8 (offset = 32) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 36) (jdk/internal/vm/Continuation)
    J state = 0x0000000000000001 (offset = 12) (jdk/internal/vm/Continuation)
    Z isAccessible = 0x00000001 (offset = 40) (jdk/internal/vm/Continuation)
    I continuationLink = 0xFFFDCDE0 (offset = 44) (jdk/internal/vm/Continuation) <hidden>
}
> !j9object 0x00000000ffe095f0
!J9Object 0x00000000FFE095F0 {
    struct J9Class* clazz = !j9class 0x138D9900 // java/lang/VirtualThread$VThreadContinuation
    Object flags = 0x00000010;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x00007FD1D5D35160 (offset = 4) (jdk/internal/vm/Continuation)
    Ljava/lang/Thread; vthread = !fj9object 0xffe08fc8 (offset = 20) (jdk/internal/vm/Continuation)
    [Ljava/lang/Object; scopedValueCache = !fj9object 0x0 (offset = 24) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/ContinuationScope; scope = !fj9object 0xffe0d8a8 (offset = 28) (jdk/internal/vm/Continuation)
    Ljava/lang/Runnable; runnable = !fj9object 0xffe2b4e8 (offset = 32) (jdk/internal/vm/Continuation)
    Ljdk/internal/vm/Continuation; parent = !fj9object 0x0 (offset = 36) (jdk/internal/vm/Continuation)
    J state = 0x00000000138E6101 (offset = 12) (jdk/internal/vm/Continuation)
    Z isAccessible = 0x00000000 (offset = 40) (jdk/internal/vm/Continuation)
    I continuationLink = 0x00000000 (offset = 44) (jdk/internal/vm/Continuation) <hidden>
}

Looks like vmRef is not NULL for any of them. state & 0x2 is 0

LinHu2016 commented 1 year ago

for both cores, two of continuations are unmounted, one is mounted, there are no concurrent states, pending mount state and finished state(all of continuations are started)

JasonFengJ9 commented 12 months ago

JDK21 x86-64_mac(macx64rt4)

java version "21.0.1-beta" 2023-10-17
IBM Semeru Runtime Certified Edition 21.0.1+12-202310281508 (build 21.0.1-beta+12-202310281508)
Eclipse OpenJ9 VM 21.0.1+12-202310281508 (build master-7498dc04c, JRE 21 Mac OS X amd64-64-Bit Compressed References 20231028_24 (JIT enabled, AOT enabled)
OpenJ9   - 7498dc04c
OMR      - 386a7080f
JCL      - c06eaf638 based on jdk-21.0.1+12)

[2023-10-28T17:21:03.433Z] variation: Mode150
[2023-10-28T17:21:03.433Z] JVM_OPTIONS:  -XX:+UseCompressedOops 

[2023-10-28T17:23:07.015Z] TEST: serviceability/jvmti/vthread/BreakpointInYieldTest/BreakpointInYieldTest.java

[2023-10-28T17:23:07.016Z] STDERR:
[2023-10-28T17:23:07.016Z] Unhandled exception
[2023-10-28T17:23:07.016Z] Type=Segmentation error vmState=0x0002000f
[2023-10-28T17:23:07.016Z] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000000
[2023-10-28T17:23:07.016Z] Handler1=000000000D75EF30 Handler2=000000000D4AE970
[2023-10-28T17:23:07.016Z] RDI=0000000037D567D8 RSI=0000700002A372E0 RAX=0000000000000018 RBX=0000000000000018
[2023-10-28T17:23:07.016Z] RCX=FFFFA000352D6448 RDX=FFFFA0006D02CC20 R8=0000700002A372DC R9=0000700002A372D8
[2023-10-28T17:23:07.016Z] R10=0000000000000018 R11=0000000037D567F0 R12=0000700002A37270 R13=0000000000000007
[2023-10-28T17:23:07.016Z] R14=0000700002A37340 R15=0000000000000000
[2023-10-28T17:23:07.016Z] RIP=000000000D93EEA2 GS=0000 FS=0000 RSP=0000700002A371F8
[2023-10-28T17:23:07.016Z] RFlags=0000000000010296 CS=002B RBP=0000700002A37220 ERR=A002B00000000000
[2023-10-28T17:23:07.016Z] TRAPNO=000000000000000D CPU=B000000000000000 FAULTVADDR=00007FCCA002B000
[2023-10-28T17:23:07.016Z] XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM1 00000000000000c6 (f: 198.000000, d: 9.782500e-322)
[2023-10-28T17:23:07.016Z] XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM6 3fe4f87a3f5026e9 (f: 1062217472.000000, d: 6.553317e-01)
[2023-10-28T17:23:07.016Z] XMM7 402a56ef8ec924cc (f: 2395546880.000000, d: 1.316980e+01)
[2023-10-28T17:23:07.016Z] XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-10-28T17:23:07.016Z] Module=/Users/jenkins/workspace/Test_openjdk21_j9_extended.openjdk_x86-64_mac/openjdkbinary/j2sdk-image/Contents/Home/lib/default/libj9vm29.dylib
[2023-10-28T17:23:07.016Z] Module_base_address=000000000D727000 Symbol=mapLocalSet
[2023-10-28T17:23:07.016Z] Symbol_address=000000000D93EE10
[2023-10-28T17:23:07.016Z] Target=2_90_20231028_24 (Mac OS X 13.2.1)
[2023-10-28T17:23:07.016Z] CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
[2023-10-28T17:23:07.016Z] ----------- Stack Backtrace -----------
[2023-10-28T17:23:07.016Z] mapLocalSet+0x93 (0x000000000D93EEA3 [libj9vm29.dylib+0x217ea3])
[2023-10-28T17:23:07.016Z] j9localmap_LocalBitsForPC+0x5fb (0x000000000D93EC7B [libj9vm29.dylib+0x217c7b])
[2023-10-28T17:23:07.016Z] walkBytecodeFrameSlots+0x178 (0x000000000D7A1F58 [libj9vm29.dylib+0x7af58])
[2023-10-28T17:23:07.016Z] walkStackFrames+0x1136 (0x000000000D7A18B6 [libj9vm29.dylib+0x7a8b6])
[2023-10-28T17:23:07.016Z] walkContinuationStackFrames+0x1b1 (0x000000000D7B7971 [libj9vm29.dylib+0x90971])
[2023-10-28T17:23:07.016Z] _ZN28GC_VMThreadStackSlotIterator21scanContinuationSlotsEP10J9VMThreadP8J9ObjectPvPFvP8J9JavaVMPS3_S4_P16J9StackWalkStatePKvEbb+0x171 (0x000000000E0CEFA1 [libj9gc29.dylib+0xe5fa1])
[2023-10-28T17:23:07.016Z] _ZN20MM_ScavengerDelegate27scanContinuationNativeSlotsEP22MM_EnvironmentStandardP8J9Object21MM_ScavengeScanReasonb+0xc9 (0x000000000E0AB009 [libj9gc29.dylib+0xc2009])
[2023-10-28T17:23:07.016Z] _ZN20MM_ScavengerDelegate16getObjectScannerEP22MM_EnvironmentStandardP8J9ObjectPvm21MM_ScavengeScanReasonPb+0x2ed (0x000000000E0AB32D [libj9gc29.dylib+0xc232d])
[2023-10-28T17:23:07.016Z] _ZN12MM_Scavenger26incrementalScanCacheBySlotEP22MM_EnvironmentStandardP24MM_CopyScanCacheStandard+0x5d6 (0x000000000E079096 [libj9gc29.dylib+0x90096])
[2023-10-28T17:23:07.016Z] _ZN12MM_Scavenger12completeScanEP22MM_EnvironmentStandard+0x1a6 (0x000000000E079AB6 [libj9gc29.dylib+0x90ab6])
[2023-10-28T17:23:07.016Z] _ZN12MM_Scavenger24workThreadGarbageCollectEP22MM_EnvironmentStandard+0x292 (0x000000000E079E82 [libj9gc29.dylib+0x90e82])
[2023-10-28T17:23:07.016Z] _ZN21MM_ParallelDispatcher16workerEntryPointEP18MM_EnvironmentBase+0x77 (0x000000000E01E747 [libj9gc29.dylib+0x35747])
[2023-10-28T17:23:07.016Z] _Z23dispatcher_thread_proc2P14OMRPortLibraryPv+0xf6 (0x000000000E01E636 [libj9gc29.dylib+0x35636])
[2023-10-28T17:23:07.016Z] omrsig_protect+0x392 (0x000000000D4AD462 [libj9prt29.dylib+0x20462])
[2023-10-28T17:23:07.016Z] dispatcher_thread_proc+0x42 (0x000000000E01E6C2 [libj9gc29.dylib+0x356c2])
[2023-10-28T17:23:07.016Z] thread_wrapper+0x13a (0x000000000D41061A [libj9thr29.dylib+0xa61a])
[2023-10-28T17:23:07.016Z] _pthread_start+0x7d (0x00007FF815E1F259 [libsystem_pthread.dylib+0x6259])
[2023-10-28T17:23:07.016Z] ---------------------------------------
[2023-10-28T17:23:07.016Z] JVMDUMP039I Processing dump event "gpf", detail "" at 2023/10/28 13:22:02 - please wait.

[2023-10-28T17:23:23.949Z] serviceability_jvmti_j9_0_FAILED
tajila commented 12 months ago

At this point we suspect jvmtiHelpers::fixBytecodesInAllStacks is the cause as it doesn't fixup continuation stacks.

@JasonFengJ9 How reproduceable is this issue? I like to verify that updating the method above actually solves it.

tajila commented 12 months ago

Its also odd that we only see it on mac

gacholio commented 12 months ago

Its also odd that we only see it on mac

This may have something to do with how freed memory is re-used.

gacholio commented 12 months ago

@babsingh @fengxue-IS Given that we have exclusive, how do we go about finding all continuations?

babsingh commented 12 months ago

Given that we have exclusive, how do we go about finding all continuations?

Through j9gc_flush_nonAllocationCaches_for_walk and j9mm_iterate_all_continuation_objects.

https://github.com/eclipse-openj9/openj9/blob/20cb61e29490ba48096a8be0b79c2e42753d24be/runtime/jvmti/jvmtiThread.c#L1410-L1413

gacholio commented 12 months ago

Thanks, I'll prototype a fix for this.

gacholio commented 12 months ago

jitCodeBreakpointAdded and jitCodeBreakpointRemoved also need to be updated.