eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

jdk19 OpenJDK crash JVM_VirtualThreadUnmountBegin #16259

Closed pshipton closed 1 year ago

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/1490/ - running sanity.openjdk on AIX The JVM was build on AIX 7.2, running on AIX 7.1, but possibly the failure is legitimate. Related to https://github.com/eclipse-openj9/openj9/issues/15274

JVM is https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK19_ppc64_aix_Personal/32/OpenJ9-JDK19-ppc64_aix-20221102-195626.tar.gz

jdk_lang_1 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder/1490/openjdk_test_output.tar.gz

12:00:21  Unhandled exception
12:00:21  Type=Segmentation error vmState=0x00040000
12:00:21  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000033
12:00:21  Handler1=09001000A113B5A8 Handler2=09001000A106F4C8
12:00:21  R0=0000000000000000 R1=0000010024183E30 R2=09001000A0F0B8C0 R3=00000000000000F0
12:00:21  R4=0000010024187A78 R5=0000000000000008 R6=090000000054A314 R7=0000000000000000
12:00:21  R8=00000000138B00B5 R9=00000000138B00B5 R10=0000000000000000 R11=0000000000000000
12:00:21  R12=090000000E2B0CC0 R13=000001002418F800 R14=0000010023A844A0 R15=00000100241844F0
12:00:21  R16=0000000000000001 R17=09001000A0F0C918 R18=0000000000000001 R19=0A00020008FF664E
12:00:21  R20=0000000000000000 R21=0000000000000000 R22=0000000000000001 R23=0A0000001FF75248
12:00:21  R24=09001000A0EEC380 R25=09001000A0F0BDB8 R26=0000000000000000 R27=09001000A0F0BE40
12:00:21  R28=00000100101304B0 R29=0000000000000000 R30=0000000000000000 R31=0000010023257200
12:00:21  IAR=090000000DE3C5D8 LR=090000000DE3C208 MSR=A00000000200D032 CTR=090000000054A314
12:00:21  CR=2200084B20000000 FPSCR=8200000000000000 XER=2000000082000000
12:00:21  FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR1 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
12:00:21  FPR2 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
12:00:21  FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR5 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
12:00:21  FPR6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR7 412e848000000000 (f: 0.000000, d: 1.000000e+06)
12:00:21  FPR8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR9 4530000000000000 (f: 0.000000, d: 1.934281e+25)
12:00:21  FPR10 412e848000000000 (f: 0.000000, d: 1.000000e+06)
12:00:21  FPR11 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
12:00:21  FPR12 4530000000000000 (f: 0.000000, d: 1.934281e+25)
12:00:21  FPR13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:00:21  Module=/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/lib/default/libjvm.so
12:00:21  Module_base_address=090000000DDE3000
12:00:21  Target=2_90_20221102_32 (AIX 7.1)
12:00:21  CPU=ppc64 (16 logical CPUs) (0x780000000 RAM)
12:00:21  ----------- Stack Backtrace -----------
12:00:21  JVM_VirtualThreadUnmountBegin+0x5c (0x090000000DDD3A00 [libjvm.so+0x1ca00])
12:00:21  JVM_VirtualThreadUnmountBegin+0x5c (0x090000000DDA7A00 [libjvm.so+0x1ca00])
12:00:21  (0x090000000DF3063C [libj9vm29.so+0xe063c])
12:00:21  ffi_call+0x110 (0x090000000DF2FF74 [libj9vm29.so+0xdff74])
12:00:21  (0x090000000E0C7950 [libj9vm29.so+0x277950])
12:00:21  (0x090000000DEDCFE8 [libj9vm29.so+0x8cfe8])
12:00:21  runJavaThread+0x28c (0x090000000DEC4BD0 [libj9vm29.so+0x74bd0])
12:00:21  _ZL23javaProtectedThreadProcP13J9PortLibraryPv+0x11c (0x090000000DE53000 [libj9vm29.so+0x3000])
12:00:21  omrsig_protect+0x4fc (0x090000000E2440E0 [libj9prt29.so+0x5f0e0])
12:00:21  javaThreadProc+0x70 (0x090000000DE52E74 [libj9vm29.so+0x2e74])
12:00:21  thread_wrapper+0x14c (0x090000000E2AD590 [libj9thr29.so+0x5590])
12:00:21  _pthread_body+0xf0 (0x090000000051FE14 [libpthreads.a+0x3e14])
12:00:21  ---------------------------------------

@tajila fyi

tajila commented 2 years ago

@babsingh ^^

tajila commented 2 years ago

10x grinder https://openj9-jenkins.osuosl.org/job/Grinder/1494/

All of these failures occurred. java/lang/StackWalker/AcrossThreads.java.AcrossThreads java/lang/StackWalker/AcrossThreads.java.AcrossThreads java/lang/Thread/NullStackTrace.java.NullStackTrace java/lang/Thread/NullStackTrace.java.NullStackTrace java/lang/Thread/NullStackTrace.java.NullStackTrace java/lang/Thread/NullStackTrace.java.NullStackTrace java/lang/Thread/NullStackTrace.java.NullStackTrace java/lang/Thread/virtual/stress/GetStackTraceALot.java#id0.GetStackTraceALot_id0 java/lang/Thread/virtual/stress/PinALot.java#id0.PinALot_id0

pshipton commented 2 years ago

Some in the regular builds.

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/41 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/41/openjdk_test_output.tar.gz

22:36:50  Type=Segmentation error vmState=0x00040000
22:36:50  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
22:36:50  Handler1=00003FFF834CFCE0 Handler2=00003FFF833F8A80
22:36:50  R0=00003FFF83AC7B04 R1=00003FFF28E1B0B0 R2=00003FFF83B17D00 R3=00000000005DB600
22:36:50  R4=00003FFF7C0B50B8 R5=0000000000000098 R6=00003FFF83E10388 R7=0000000000000000
22:36:50  R8=FFFFFFFFFFFFFFFF R9=00003FFF7C02FC90 R10=0000000000000000 R11=0000000000000000
22:36:50  R12=0000000000002200 R13=00003FFF28E26900 R14=00000000004033E0 R15=00000000005DB600
22:36:50  R16=00003FFF612F0038 R17=FFFFFFFFFFFFFFFF R18=0000000000000000 R19=0000000000000010
22:36:50  R20=00003FFF83BADEF0 R21=0000000000000003 R22=00000000005DB600 R23=00003FFF28E1B5B0
22:36:50  R24=0000000000000001 R25=00003FFF83B13C38 R26=00003FFF83B13C30 R27=0000000000000000
22:36:50  R28=00003FFF83B130E0 R29=0000000000000000 R30=00000000005DB600 R31=00003FFF7C02FC90
22:36:50  NIP=00003FFF83AC7E60 MSR=800000010280F033 ORIG_GPR3=00000000000081C8 CTR=00003FFF83DF15B0
22:36:50  LINK=00003FFF83AC7B04 XER=0000000000000000 CCR=0000000040222822 SOFTE=0000000000000001
22:36:50  TRAP=0000000000000300 DAR=0000000000000098 dsisr=0000000042000000 RESULT=0000000000000000
22:36:50  FPR0 0000000000000003 (f: 3.000000, d: 1.482197e-323)
22:36:50  FPR1 405365ea60000000 (f: 1610612736.000000, d: 7.759243e+01)
22:36:50  FPR2 41e0000000000000 (f: 0.000000, d: 2.147484e+09)
22:36:50  FPR3 3fee666660000000 (f: 1610612736.000000, d: 9.500000e-01)
22:36:50  FPR4 401b9c7ea0000000 (f: 2684354560.000000, d: 6.902827e+00)
22:36:50  FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR6 0000000000000077 (f: 119.000000, d: 5.879381e-322)
22:36:50  FPR7 00000000000020d0 (f: 8400.000000, d: 4.150151e-320)
22:36:50  FPR8 00000000000000b2 (f: 178.000000, d: 8.794368e-322)
22:36:50  FPR9 000000000000006b (f: 107.000000, d: 5.286502e-322)
22:36:50  FPR10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR11 00000000000156ec (f: 87788.000000, d: 4.337303e-319)
22:36:50  FPR12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR13 bfb7440000000000 (f: 0.000000, d: -9.088135e-02)
22:36:50  FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:36:50  Module=/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/openjdkbinary/j2sdk-image/lib/default/libjvm.so
22:36:50  Module_base_address=00003FFF83AA0000 Symbol=JVM_VirtualThreadUnmountBegin
22:36:50  Symbol_address=00003FFF83AC7A90
22:36:50  Target=2_90_20221103_108 (Linux 4.4.0-173-generic)
22:36:50  CPU=ppc64le (16 logical CPUs) (0x1fe090000 RAM)
22:36:50  ----------- Stack Backtrace -----------
22:36:50  JVM_VirtualThreadUnmountBegin+0x3d0 (0x00003FFF83AC7E60 [libjvm.so+0x27e60])
22:36:50  JVM_VirtualThreadUnmountBegin+0x74 (0x00003FFF83AC7B04 [libjvm.so+0x27b04])
22:36:50  JVM_VirtualThreadUnmountBegin+0x30 (0x00003FFF83BADF20 [libjvm.so+0xdf20])
22:36:50   (0x00003FFF83689478 [libj9vm29.so+0x1f9478])
22:36:50  ffi_call_int+0xd4 (0x00003FFF83689144 [libj9vm29.so+0x1f9144])
22:36:50  bytecodeLoopCompressed+0x116cc (0x00003FFF835562AC [libj9vm29.so+0xc62ac])
22:36:50   (0x00003FFF835F5EA0 [libj9vm29.so+0x165ea0])
22:36:50  runJavaThread+0x250 (0x00003FFF834A6DA0 [libj9vm29.so+0x16da0])
22:36:50  javaProtectedThreadProc+0x138 (0x00003FFF83543518 [libj9vm29.so+0xb3518])
22:36:50  omrsig_protect+0x3f4 (0x00003FFF833F9F54 [libj9prt29.so+0x39f54])
22:36:50  javaThreadProc+0x60 (0x00003FFF8353EC80 [libj9vm29.so+0xaec80])
22:36:50  thread_wrapper+0x190 (0x00003FFF8338CBC0 [libj9thr29.so+0xcbc0])
22:36:50  start_thread+0xf0 (0x00003FFF83DE8040 [libpthread.so.0+0x8040])
22:36:50  clone+0x98 (0x00003FFF83D03BB0 [libc.so.6+0x123bb0])
22:36:50  ---------------------------------------

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/41/ jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/41/openjdk_test_output.tar.gz

23:33:55  Type=Segmentation error vmState=0x00040000
23:33:55  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
23:33:55  Handler1=0000000008E37C60 Handler2=0000000009136B10 InaccessibleAddress=0000000000000098
23:33:55  RDI=0000000000000020 RSI=00007FB2AE814C20 RAX=0000000000000000 RBX=00000000115CF600
23:33:55  RCX=0000000000000098 RDX=0000000000000098 R8=0000000000000001 R9=0000000007599838
23:33:55  R10=0000000000000000 R11=00000000074F3EF0 R12=0000000000000000 R13=0000000011234128
23:33:55  R14=0000000000000000 R15=00007FB2AE814C20
23:33:55  RIP=000000000752A2AF GS=0000 FS=0000 RSP=0000700008C34720
23:33:55  RFlags=0000000000010202 CS=002B RBP=0000700008C34760 ERR=0000009800000006
23:33:55  TRAPNO=000000060000000E CPU=0098000000060000 FAULTVADDR=0000000000000098
23:33:55  XMM0 0000000011234120 (f: 287523104.000000, d: 1.420553e-315)
23:33:55  XMM1 00000000367cbcaa (f: 914144448.000000, d: 4.516474e-315)
23:33:55  XMM2 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
23:33:55  XMM3 4330000000000000 (f: 0.000000, d: 4.503600e+15)
23:33:55  XMM4 412e848000000000 (f: 0.000000, d: 1.000000e+06)
23:33:55  XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:33:55  Module=/Users/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/openjdkbinary/j2sdk-image/lib/default/libjvm.dylib
23:33:55  Module_base_address=0000000007511000 Symbol=JVM_VirtualThreadUnmountBegin
23:33:55  Symbol_address=000000000752A100
23:33:55  Target=2_90_20221103_108 (Mac OS X 10.15.7)
23:33:55  CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
23:33:55  ----------- Stack Backtrace -----------
23:33:55  JVM_VirtualThreadUnmountBegin+0x1b0 (0x000000000752A2B0 [libjvm.dylib+0x192b0])
23:33:55  ffi_call_unix64+0x52 (0x00000000090703C2 [libj9vm29.dylib+0x2703c2])
23:33:55  ffi_call_int+0x304 (0x000000000906FA74 [libj9vm29.dylib+0x26fa74])
23:33:55  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x1edda (0x0000000008EADCAA [libj9vm29.dylib+0xadcaa])
23:33:55  bytecodeLoopCompressed+0xb4 (0x0000000008E8EEC4 [libj9vm29.dylib+0x8eec4])
23:33:55  ---------------------------------------
pshipton commented 2 years ago

See also https://github.com/eclipse-openj9/openj9/issues/16257, crashed in java/lang/Thread/virtual/stress/GetStackTraceALot.java#id0 The grinder there failed 4/100, only the 2 on ub18s390xrt-1-10 show crashes.

There is an assertion in the other 2 failures.

13:33:03  Corruption in Evacuate at 000003FF95877600: calculated object size 4395727264376 larger than available 1149440, Forwarded Header at 000003FFA277E6E0
13:33:03  17:32:56.205 0x3ff0c002200    j9mm.141    *   ** ASSERTION FAILED ** at ../../../../../../omr/gc/base/standard/Scavenger.cpp:1657: ((false))
babsingh commented 2 years ago

re https://github.com/eclipse-openj9/openj9/issues/16259#issuecomment-1302634214:

https://openj9-jenkins.osuosl.org/job/Grinder/1494/

The test framework does not correctly report failures and stdout/stderr. Some of the reported test failures are actually a PASS.

In the raw output, I see 5 segfaults with the same native stack for java/lang/Thread/NullStackTrace.java:

16:50:30  Module=/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/lib/default/libj9thr29.so
16:50:30  Module_base_address=090000000E226000
16:50:30  Target=2_90_20221025_103 (AIX 7.1)
16:50:30  CPU=ppc64 (16 logical CPUs) (0x780000000 RAM)
16:50:30  ----------- Stack Backtrace -----------
16:50:30  omrthread_monitor_enter+0x1a4 (0x090000000E22DC08 [libj9thr29.so+0x7c08])
16:50:30  haltThreadForInspection+0x128 (0x090000000DEBE14C [libj9vm29.so+0x3014c])
16:50:30  getStackTraceForThread+0x1c0 (0x090000001C1C6F44 [libjclse29.so+0x41f44])
16:50:30  Java_java_lang_Thread_getStackTraceImpl+0x1ec (0x090000001C1F00B0 [libjclse29.so+0x6b0b0])

Last change to the above code was https://github.com/eclipse-openj9/openj9/pull/16158. @fengxue-IS Can you please take a look at the NullStackTrace failure? I will investigate the JVM_VirtualThreadUnmountBegin failure.

pshipton commented 2 years ago

FYI https://github.com/adoptium/aqa-tests/issues/4115

llxia commented 2 years ago

The test framework does not correctly report failures and stdout/stderr. Some of the reported test failures are actually a PASS.

Could you point to me which case you see test failure is actually a pass? The console output looks correct to me. And I see 18 tests, 12 ok, 6 not ok, 0 skipped, 0 ToDo, 0 Bail Out! in TAP https://openj9-jenkins.osuosl.org/job/Grinder/1494/tapResults/

babsingh commented 2 years ago

It's the test report view: https://openj9-jenkins.osuosl.org/job/Grinder/1494/testReport/. Example: PinALot.java#id0.PinALot_id0 passed but shows up as failed.

babsingh commented 2 years ago

I was able to reproduce the JVM_VirtualThreadUnmountBegin crash locally on my xlinux machine.

Core files from my local crash: https://ibm.box.com/shared/static/nw4csup3s35v32cpdbaelzqvhdelrwf6.zip

The failure is very intermittent: 1/50 or 1/100 runs.

Issue

During the last unmount, we are removing a virtual thread object from the liveVirtualThreadList. The previous and next virtual thread objects in the list are NULL. Guess: GC has cleaned the previous and next virtual thread objects, and updated the object references to NULL in the list. @LinHu2016 @amicic @dmitripivkine Does the GC remove the virtual thread objects from liveVirtualThreadList when freeing them?

Related Source Code

https://github.com/eclipse-openj9/openj9/blob/db6aad2d49a3fa8e4e7f12c8452941089ee9b8db/runtime/j9vm/javanextvmi.cpp#L390-L398

Info from GDB

#11 0x00007fb4649c23da in mainSynchSignalHandler (signal=11, sigInfo=0x7fb3d7df7870, contextInfo=0x7fb3d7df7740) at /root/openj9-openjdk-jdk/omr/port/unix/omrsignal.c:1066
#12 <signal handler called>
#13 0x00007fb465a01a83 in JVM_VirtualThreadUnmountBegin (env=0x738d00, thread=<optimized out>, lastUnmount=<optimized out>) at /root/openj9-openjdk-jdk/openj9/runtime/j9vm/javanextvmi.cpp:396
#14 0x00007fb4655857aa in ffi_call_unix64 () at /root/openj9-openjdk-jdk/openj9/runtime/libffi/x86/unix64.S:105
#15 0x00007fb465583a8c in ffi_call_int (cif=0x7fb3d7df8710, fn=0x7fb465c34740 <JVM_VirtualThreadUnmountBegin>, rvalue=<optimized out>, avalue=0x7fb3d7df87b0, closure=<optimized out>)
    at /root/openj9-openjdk-jdk/openj9/runtime/libffi/x86/ffi64.c:672
#16 0x00007fb465478361 in VM_BytecodeInterpreterCompressed::cJNICallout (isStatic=<optimized out>, function=<optimized out>, returnStorage=<optimized out>, returnType=<optimized out>,
    javaArgs=<optimized out>, receiverAddress=0x3922b8, _pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /root/openj9-openjdk-jdk/openj9/runtime/vm/BytecodeInterpreter.hpp:2502
#17 VM_BytecodeInterpreterCompressed::callCFunction (returnType=<optimized out>, isStatic=<optimized out>, bp=<optimized out>, javaArgs=<optimized out>, receiverAddress=<optimized out>,
    jniMethodStartAddress=<optimized out>, _pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /root/openj9-openjdk-jdk/openj9/runtime/vm/BytecodeInterpreter.hpp:2320
#18 VM_BytecodeInterpreterCompressed::runJNINative (_pc=<optimized out>, _sp=<optimized out>, this=<optimized out>) at /root/openj9-openjdk-jdk/openj9/runtime/vm/BytecodeInterpreter.hpp:2210
#19 VM_BytecodeInterpreterCompressed::run (this=0x7fb3d7df88f0, vmThread=0x0) at /root/openj9-openjdk-jdk/openj9/runtime/vm/BytecodeInterpreter.hpp:10760
#20 0x00007fb46546618d in bytecodeLoopCompressed (currentThread=<optimized out>) at /root/openj9-openjdk-jdk/openj9/runtime/vm/BytecodeInterpreter.inc:112
#21 0x00007fb465511ff2 in c_cInterpreter () at /root/openj9-openjdk-jdk/build/linux-x86_64-server-release/vm/runtime/vm/xcinterp.s:158
#22 0x00007fb4653eb47a in runJavaThread (currentThread=0x7fb465436027 <javaLookupMethodImpl+935>, currentThread@entry=0x738d00) at /root/openj9-openjdk-jdk/openj9/runtime/vm/callin.cpp:682
#23 0x00007fb465464e5d in javaProtectedThreadProc (portLibrary=portLibrary@entry=0x7fb465c27380 <j9portLibrary>, entryarg=entryarg@entry=0x738d00)
    at /root/openj9-openjdk-jdk/openj9/runtime/vm/vmthread.cpp:2093
#24 0x00007fb4649c2f13 in omrsig_protect (portLibrary=0x7fb465c27380 <j9portLibrary>, fn=0x7fb465464da0 <javaProtectedThreadProc(J9PortLibrary*, void*)>, fn_arg=0x738d00,
    handler=0x7fb46540df10 <structuredSignalHandler>, handler_arg=0x738d00, flags=506, result=0x7fb3d7df8dc8) at /root/openj9-openjdk-jdk/omr/port/unix/omrsignal.c:425
#25 0x00007fb465460eaa in javaThreadProc (entryarg=0x7fb460018bf0) at /root/openj9-openjdk-jdk/openj9/runtime/vm/vmthread.cpp:372
#26 0x00007fb4651c2302 in thread_wrapper (arg=0x7fb3d8004970) at /root/openj9-openjdk-jdk/omr/thread/common/omrthread.c:1724
#27 0x00007fb465e446db in start_thread (arg=0x7fb3d7dfb700) at pthread_create.c:463
#28 0x00007fb46659e61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) f 13
#13 0x00007fb465a01a83 in JVM_VirtualThreadUnmountBegin (env=0x738d00, thread=<optimized out>, lastUnmount=<optimized out>) at /root/openj9-openjdk-jdk/openj9/runtime/j9vm/javanextvmi.cpp:396
396             J9OBJECT_OBJECT_STORE(currentThread, threadPrev, vm->virtualThreadLinkNextOffset, threadNext);

(gdb) p threadPrev
$2 = (j9object_t) 0x0

(gdb) p threadNext
$3 = (j9object_t) 0x0
babsingh commented 2 years ago

Grinders for the JVM_VirtualThreadUnmountBegin crash:

  1. -Xgcpolicy:nogc: https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/29286/consoleFull (0 failures)
  2. With the default GC policy: https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/29287/consoleFull (5 failures)
fengxue-IS commented 2 years ago

Quick look at the failing build which uses SDK https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK19_ppc64_aix_Nightly/103/OpenJ9-JDK19-ppc64_aix-20221025-234653.tar.gz, this didn't include the changeset that is suppose to address the synchronization issue related to thread stacktrace https://github.com/ibmruntimes/openj9-openjdk-jdk19/pull/47

babsingh commented 2 years ago

re https://github.com/eclipse-openj9/openj9/issues/16259#issuecomment-1304140700:

@fengxue-IS can you rerun a grinder to confirm that the java/lang/Thread/NullStackTrace.java failure is resolved with https://github.com/ibmruntimes/openj9-openjdk-jdk19/pull/47?

fengxue-IS commented 2 years ago

launched grinder https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/29294/, will update once its complete

babsingh commented 2 years ago

re https://github.com/eclipse-openj9/openj9/issues/16259#issuecomment-1303988550:

The grinders for the JVM_VirtualThreadUnmountBegin crash have completed. The failure only happens with the GC enabled. Transferring to the GC team.

Issue

During the last unmount, we are removing a virtual thread object from the liveVirtualThreadList. The previous and next virtual thread objects in the list are NULL. Guess: GC has cleaned the previous and next virtual thread objects, and updated the object references to NULL in the list. Does the GC remove the virtual thread objects from liveVirtualThreadList when collecting/freeing them?

tajila commented 2 years ago

@LinHu2016 please take a look

fengxue-IS commented 2 years ago

launched grinder https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/29294/, will update once its complete

Grinder completed, 20/20 passed using latest nightly build. Confirm java/lang/Thread/NullStackTrace.java fixed in current head branch

dmitripivkine commented 2 years ago

@tajila @babsingh Is something here left to be investigated by GC Team (due grinder with java/lang/Thread/NullStackTrace.java fix is passed) ?

babsingh commented 2 years ago

due grinder with java/lang/Thread/NullStackTrace.java fix is passed) ?

There are two issues. @dmitripivkine The one impacted by the GC is java/lang/Thread/virtual/ParkWithFixedThreadPool.java and it is described in

It still needs to be investigated.

LinHu2016 commented 2 years ago

Hi @babsingh, linkNext and linkPrevious are both hidden fields for VirtualThread Object, not sure if GC scan them as the references, but there is no explicit clean up for those fields, and does GC monitor liveVirtualThreadList as root(I don't think GC manage liveVirtualThreadList at all)? if the VirtualThread Object is not traceable from root, it might be collected by GC, collected heap memory can be overwritten later or cleaned up.

LinHu2016 commented 2 years ago

@fengxue-IS mentioned that liveVirtualThreadList is created as "global reference", so it should be a root for GC, any VT Object in the list should be scaned by GC and keep them alive, but if there is a race condition between trace the list for scan and add new one to the list, GC might read wrong nextLink or prevLink, then cause some live VT Objects were missed by GC... please double check if rest fields of VT Object are zero or garbage, just want to confirm if the Object has been collected by GC.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_s390x_linux_Nightly/45 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_s390x_linux_Nightly/45/openjdk_test_output.tar.gz

22:01:26  Type=Segmentation error vmState=0x00000000
22:01:26  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=f1b68270 Signal_Code=00000001
22:01:26  Handler1=000003FFB8FC44B8 Handler2=000003FFB8EB1CB8 InaccessibleAddress=0000000000000000
22:01:26  gpr0=0000000000000001 gpr1=000003FFB401EE20 gpr2=0000000000000000 gpr3=0000000000000098
22:01:26  gpr4=FFFFFFFFFFFFFFFF gpr5=0000000000000000 gpr6=000003FFB95CC298 gpr7=0000000000000000
22:01:26  gpr8=0000000000000000 gpr9=000003FFB95CEEF8 gpr10=000000000057E300 gpr11=000003FFB401EE20
22:01:26  gpr12=000003FF92F72D00 gpr13=000003FFB95CC300 gpr14=000003FFB95A2B8A gpr15=000003FFB80FE3F0
22:01:26  psw=000003FFB95A2E16 mask=0705200180000000 fpc=0008fe00 bea=000003FFB95A2C82
22:01:26  fpr0 000003ffad764448 (f: 2910209024.000000, d: 2.172240e-311)
22:01:26  fpr1 4881b2a000000000 (f: 0.000000, d: 1.927115e+41)
22:01:26  fpr2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:01:26  fpr3 397d6e1100000000 (f: 0.000000, d: 9.068803e-32)
22:01:26  fpr4 4881b2a040000000 (f: 1073741824.000000, d: 1.927115e+41)
22:01:26  fpr5 3e92492500000000 (f: 0.000000, d: 2.724785e-07)
22:01:26  fpr6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:01:26  fpr7 3e3a332600000000 (f: 0.000000, d: 6.100116e-09)
22:01:26  fpr8 000003ffb90226f8 (f: 3103926016.000000, d: 2.172335e-311)
22:01:26  fpr9 0000000000525028 (f: 5394472.000000, d: 2.665223e-317)
22:01:26  fpr10 000003ffb401ee20 (f: 3020025344.000000, d: 2.172294e-311)
22:01:26  fpr11 0000000000c2e820 (f: 12773408.000000, d: 6.310902e-317)
22:01:26  fpr12 0005ececbff3eda1 (f: 3220434432.000000, d: 8.240405e-309)
22:01:26  fpr13 000003ff0407b258 (f: 67613272.000000, d: 2.170835e-311)
22:01:26  fpr14 0000000000525030 (f: 5394480.000000, d: 2.665227e-317)
22:01:26  fpr15 000003ff040ae6f8 (f: 67823352.000000, d: 2.170835e-311)
22:01:26  Module=/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_s390x_linux_Nightly/openjdkbinary/j2sdk-image/lib/default/libjvm.so
22:01:26  Module_base_address=000003FFB9580000 Symbol=JVM_VirtualThreadUnmountBegin
22:01:26  Symbol_address=000003FFB95A2AF8
22:01:26  Target=2_90_20221107_110 (Linux 3.10.0-1160.76.1.el7.s390x)
22:01:26  CPU=s390x (4 logical CPUs) (0x1ec1b1000 RAM)
22:01:26  ----------- Stack Backtrace -----------
22:01:26  JVM_VirtualThreadUnmountBegin+0x31e (0x000003FFB95A2E16 [libjvm.so+0x22e16])
22:01:26   (0x000003FF92F72CA4 [<unknown>+0x0])
22:01:26  ---------------------------------------
babsingh commented 2 years ago

re https://github.com/eclipse-openj9/openj9/issues/16259#issuecomment-1306324636:

please double check if rest fields of VT Object are zero or garbage, just want to confirm if the Object has been collected by GC.

Core files from my local crash: https://ibm.box.com/shared/static/nw4csup3s35v32cpdbaelzqvhdelrwf6.zip

Not all fields are zero for the VirtualThread that invokes JVM_VirtualThreadUnmountBegin with a NULL linkNext and linkPrevious. Some fields are set but they are not garbage. Note: jdmpview should be built with #16285 to view the core file; otherwise, all hidden fields won't be seen.

> !fj9object 0x00000000FFD91800
!J9Object 0x00000000FFD91800 {
    struct J9Class* clazz = !j9class 0x95800 // java/lang/VirtualThread
    Object flags = 0x00000012;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    J eetop = 0x0000000000000000 (offset = 4) (java/lang/Thread)
    J tid = 0x000000000000001C (offset = 12) (java/lang/Thread)
    Ljava/lang/String; name = !fj9object 0xffd91c30 (offset = 36) (java/lang/Thread)
    Z interrupted = 0x00000000 (offset = 92) (java/lang/Thread)
    Ljava/lang/ClassLoader; contextClassLoader = !fj9object 0xd0084518 (offset = 40) (java/lang/Thread)
    Ljava/security/AccessControlContext; inheritedAccessControlContext = !fj9object 0xffd91c48 (offset = 44) (java/lang/Thread)
    Ljava/lang/Thread$FieldHolder; holder = !fj9object 0x0 (offset = 48) (java/lang/Thread)
    Ljava/lang/ThreadLocal$ThreadLocalMap; threadLocals = !fj9object 0x0 (offset = 52) (java/lang/Thread)
    Ljava/lang/ThreadLocal$ThreadLocalMap; inheritableThreadLocals = !fj9object 0xffd91c78 (offset = 56) (java/lang/Thread)
    Ljava/lang/Object; extentLocalBindings = !fj9object 0x0 (offset = 60) (java/lang/Thread)
    Ljava/lang/Object; interruptLock = !fj9object 0xffd91c90 (offset = 64) (java/lang/Thread)
    Ljava/lang/Object; parkBlocker = !fj9object 0x0 (offset = 68) (java/lang/Thread)
    Lsun/nio/ch/Interruptible; nioBlocker = !fj9object 0x0 (offset = 72) (java/lang/Thread)
    Ljdk/internal/vm/Continuation; cont = !fj9object 0x0 (offset = 76) (java/lang/Thread)
    Ljava/lang/Thread$UncaughtExceptionHandler; uncaughtExceptionHandler = !fj9object 0x0 (offset = 80) (java/lang/Thread)
    J threadLocalRandomSeed = 0x0000000000000000 (offset = 20) (java/lang/Thread)
    I threadLocalRandomProbe = 0x00000000 (offset = 96) (java/lang/Thread)
    I threadLocalRandomSecondarySeed = 0x00000000 (offset = 100) (java/lang/Thread)
    Ljdk/internal/vm/ThreadContainer; container = !fj9object 0xffd85ea8 (offset = 84) (java/lang/Thread)
    Ljdk/internal/vm/StackableScope; headStackableScopes = !fj9object 0x0 (offset = 88) (java/lang/Thread)
    Z started = 0x00000000 (offset = 104) (java/lang/Thread)
    Z stopCalled = 0x00000000 (offset = 108) (java/lang/Thread)
    J tls = 0x0000000000000000 (offset = 28) (java/lang/Thread) <hidden>
    Ljava/util/concurrent/Executor; scheduler = !fj9object 0xffd85eb8 (offset = 124) (java/lang/VirtualThread)
    Ljdk/internal/vm/Continuation; cont = !fj9object 0xffd91ca0 (offset = 128) (java/lang/VirtualThread)
    Ljava/lang/Runnable; runContinuation = !fj9object 0xffd918e0 (offset = 132) (java/lang/VirtualThread)
    I state = 0x00000003 (offset = 112) (java/lang/VirtualThread)
    Z parkPermit = 0x00000000 (offset = 152) (java/lang/VirtualThread)
    Ljava/lang/Thread; carrierThread = !fj9object 0xffdf0fc0 (offset = 136) (java/lang/VirtualThread)
    Ljava/util/concurrent/CountDownLatch; termination = !fj9object 0xffde6960 (offset = 140) (java/lang/VirtualThread)
    Ljava/lang/VirtualThread; linkNext = !fj9object 0x0 (offset = 148) (java/lang/VirtualThread) <hidden>
    Ljava/lang/VirtualThread; linkPrevious = !fj9object 0x0 (offset = 144) (java/lang/VirtualThread) <hidden>
    J inspectorCount = 0xFFFFFFFFFFFFFFFF (offset = 116) (java/lang/VirtualThread) <hidden>
    I isSuspendedByJVMTI = 0x00000000 (offset = 156) (java/lang/VirtualThread) <hidden>

If I start from the root of liveVirtualThreadList and iterate through the list, I receive the following error:

> !fj9object 0xfffa6f30
Unable to read object clazz at 0x00000000FFFA6F30 (clazz = 0x00000000FFD91800)

This indicates that the circular linked list is broken, and I don't see 0xFFD91800 in the list which is the VirtualThread object with a NULL linkNext and linkPrevious.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/44 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/44/openjdk_test_output.tar.gz

21:54:22  JVM_VirtualThreadUnmountBegin+0x1b0 (0x0000000005EF82B0 [libjvm.dylib+0x192b0])
21:54:22  ffi_call_unix64+0x52 (0x00000000064B93C2 [libj9vm29.dylib+0x2703c2])
21:54:22  ffi_call_int+0x304 (0x00000000064B8A74 [libj9vm29.dylib+0x26fa74])
21:54:22  _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x1edda (0x00000000062F674A [libj9vm29.dylib+0xad74a])
21:54:22  bytecodeLoopCompressed+0xb4 (0x00000000062D7964 [libj9vm29.dylib+0x8e964])
dmitripivkine commented 2 years ago

... linkNext and linkPrevious are both hidden fields for VirtualThread Object, not sure if GC scan them as the references...

just for the record: GC scans (so updates if object has been moved) these two fields as defined in j9class-> instanceDescription = 0x0000007F00FFFC01, 0x7 here maps slots offsets:

1 140 termination
1 144 linkPrevious
1 148 linkNext
0 152
pshipton commented 2 years ago

Related to this issue?

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/48 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/48/openjdk_test_output.tar.gz

The test timed out. The console output doesn't show this thread, but the javacore created by the timeout handler shows

3XMTHREADINFO      "pool-1-thread-1" J9VMThread:0x0000000000725100, omrthread_t:0x000000179CF416C8, java/lang/Thread:0x00000000FFF52DE0, state:R, prio=0
3XMJAVALTHREAD            (java/lang/Thread getId:0x1C, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000E0045D60)
3XMTHREADINFO1            (native thread ID:0x12FC, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x000000a1)
3XMCPUTIME               CPU usage total: 0.0 secs, user: 0.0 secs, system: 0.0 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/VirtualThread.notifyJvmtiUnmountBegin(Native Method)
4XESTACKTRACE                at java/lang/VirtualThread.run(VirtualThread.java:310)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation$$Lambda$91/0x000000009d110f80.run(Bytecode PC:8)
4XESTACKTRACE                at jdk/internal/vm/Continuation.execute(Continuation.java:171)
pshipton commented 2 years ago

Another timeout similar to the previous comment https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/46/ jdk_lang_1 java/lang/Thread/virtual/HoldsLock.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/46/openjdk_test_output.tar.gz

23:13:03  VirtualThread[#25]/waiting@Thread-4 is waiting on java.lang.Object@b79f319d
23:13:03        java.lang.Object@b79f319d

3XMTHREADINFO      "Thread-4" J9VMThread:0x00007FC8B822AE00, omrthread_t:0x00007FC8B4029C68, java/lang/Thread:0x00007FC951A24EE0, state:R, prio=0
3XMJAVALTHREAD            (java/lang/Thread getId:0x19, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00007FC931B09B80)
3XMTHREADINFO1            (native thread ID:0x1DDE, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x000000a1)
3XMTHREADINFO2            (native stack address range from:0x00007FC910B09000, to:0x00007FC910B89000, size:0x80000)
3XMCPUTIME               CPU usage total: 0.008490149 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/VirtualThread.notifyJvmtiUnmountBegin(Native Method)
4XESTACKTRACE                at java/lang/VirtualThread.run(VirtualThread.java:310)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation$$Lambda$91/0x00000000b8239748.run(Bytecode PC:8)
4XESTACKTRACE                at jdk/internal/vm/Continuation.execute(Continuation.java:171)
babsingh commented 2 years ago

Observations

The two timeouts above and the original JVM_VirtualThreadUnmountBegin crash have the following in common:

Issue

The above indicators suggest that JVM_VirtualThreadMountEnd was not invoked after JVM_VirtualThreadMountBegin by the Java code in VirtualThread.java (OpenJ9 uses the RI's VirtualThread implementation). This behaviour breaks our assumptions listed below. @fengxue-IS Is this scenario possible?

Solution

We will have to update the wait logic in the JVM_VirtualThread* functions, and also add extra safety checks while removing items from the liveVirtualThreadList.

Our current assumptions

Current equence of events

JVM_VirtualThreadMountBegin
  - Wait if inspectorCount != 0
  - Set inspectorCount = -1

JVM_VirtualThreadMountEnd
  - Add VirtualThread to the liveVirtualThreadList
  - Set inspectorCount = 0

JVM_VirtualThreadUnmountBegin
  - Wait if inspectorCount != 0
  - Set inspectorCount = -1
  - Remove VirtualThread from the liveVirtualThreadList

JVM_VirtualThreadUnmountEnd
  - Set inspectorCount = 0
fengxue-IS commented 1 year ago

FYI @babsingh

The above indicators suggest that JVM_VirtualThreadMountEnd was not invoked after JVM_VirtualThreadMountBegin by the Java code in VirtualThread.java (OpenJ9 uses the RI's VirtualThread implementation). This behaviour breaks our assumptions listed below. @fengxue-IS Is this scenario possible?

Yes, if the call to cont.run() in VirtualThread.runContinuation() failed, then the code will be redirected to afterYield which calls notifyJvmtiUnmountEnd

Our current assumptions

  • JVM_VirtualThreadMountBegin will always be followed by JVM_VirtualThreadMountEnd; and
  • JVM_VirtualThreadUnmountBegin will always be followed by JVM_VirtualThreadUnmountEnd.

The correct assumption should be

Given how first mount / last unmount have special condition, I don't think that should affect list modification (after moving adding to list operation to first mount end)

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/48 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

I don't find any diagnostic files.

23:06:39  Timeout information:
23:06:39  Running jcmd on process 17917
23:06:39  Timeout refired 300 times
23:06:39  Error getting data from 17917: Exception connecting to 17917

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/48 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

I don't find any diagnostic files.

23:49:27  Timeout information:
23:49:27  Running jcmd on process 40146
23:49:27  Error getting data from 40146: Exception connecting to 40146
babsingh commented 1 year ago

re https://github.com/eclipse-openj9/openj9/issues/16259#issuecomment-1314473836: New assumptions.

@fengxue-IS What can cause the below scenario? This is the cause of most hangs.

VirtualThread.inspectorCount is 0xFFFFFFFFFFFFFFFF (== -1) before entering JVM_VirtualThreadUnmountBegin.

fengxue-IS commented 1 year ago

re #16259 (comment): New assumptions.

@fengxue-IS What can cause the below scenario? This is the cause of most hangs.

VirtualThread.inspectorCount is 0xFFFFFFFFFFFFFFFF (== -1) before entering JVM_VirtualThreadUnmountBegin.

if there is a special case where 2 carrier tried to mount the same vthread, only one will will be successful, while the other failed during mount, but didn't reach the unmountend yet.

Something like:

Carrier 1:
    mountbegin v1
    mountend v1
carrier 2:
    mountbegin v1
        cont.run() fails
carrier 1: (finishes)
    unmountbegin v1

Though I don't see how this can occur since VirtualThread.runContinuation have a CAS on the thread state to prevent this.

The other possibility is there is an incorrect decrement(modification) on the inspector count by native code

babsingh commented 1 year ago

The other possibility is there is an incorrect decrement(modification) on the inspector count by native code

The above native code scenario does not seem possible since all inspector count decrements are wrapped with the below assert:

Assert_JVMTI_true(vthreadInspectorCount > 0);
pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_s390x_linux_Nightly/51 jdk_lang_1 java/lang/Thread/virtual/stress/GetStackTraceALot.java#id0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_s390x_linux_Nightly/51/openjdk_test_output.tar.gz Timeout. Diagnostics produced core and javacore, which include the following thread.

3XMTHREADINFO      "worker-1" J9VMThread:0x000003FF804FF500, omrthread_t:0x000003FF804F3CA8, java/lang/Thread:0x000003FF7E56B798, state:R, prio=0
3XMJAVALTHREAD            (java/lang/Thread getId:0x18, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x000003FF5E865ED8)
3XMTHREADINFO1            (native thread ID:0x176B99, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000001)
3XMTHREADINFO2            (native stack address range from:0x000003FF7FF40000, to:0x000003FF7FF80000, size:0x40000)
3XMCPUTIME               CPU usage total: 0.001868982 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/VirtualThread.notifyJvmtiUnmountBegin(Native Method)
4XESTACKTRACE                at java/lang/VirtualThread.yieldContinuation(VirtualThread.java:372)
4XESTACKTRACE                at java/lang/VirtualThread.parkNanos(VirtualThread.java:537)
4XESTACKTRACE                at java/lang/Access.parkVirtualThread(Access.java:502)
4XESTACKTRACE                at jdk/internal/misc/VirtualThreads.park(VirtualThreads.java:67)
4XESTACKTRACE                at java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:408)
4XESTACKTRACE                at GetStackTraceALot.lambda$main$0(GetStackTraceALot.java:93)
4XESTACKTRACE                at GetStackTraceALot$$Lambda$17/0x0000000085461ca0.run(Bytecode PC:8)
4XESTACKTRACE                at java/lang/VirtualThread.run(VirtualThread.java:292)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation$$Lambda$23/0x0000000085321a08.run(Bytecode PC:8)
4XESTACKTRACE                at jdk/internal/vm/Continuation.execute(Continuation.java:171)
JasonFengJ9 commented 1 year ago

JDK19 internal build(ub20x64rt1-2)

[2022-11-18T00:53:31.203Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
[2022-11-18T00:53:31.203Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops 

openjdk version "19.0.1-beta" 2022-10-18
IBM Semeru Runtime Open Edition 19.0.1+10-202211180036 (build 19.0.1-beta+10-202211180036)
Eclipse OpenJ9 VM 19.0.1+10-202211180036 (build master-fddfd3f96, JRE 19 Linux amd64-64-Bit Compressed References 20221118_77 (JIT enabled, AOT enabled)
OpenJ9   - fddfd3f96
OMR      - fe4c3b9b5
JCL      - 110e1e7808 based on jdk-19.0.1+10)

[2022-11-18T01:31:45.005Z] TEST: java/lang/Thread/virtual/stress/GetStackTraceALot.java#id0

[2022-11-18T01:31:45.006Z] STDERR:
[2022-11-18T01:31:45.006Z] JVMDUMP034I User requested System dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/aqa-tests/TKG/output_16687328092528/jdk_lang_0/work/scratch/core.20221117.173134.3595522.0001.dmp' through com.ibm.jvm.Dump.systemDumpToFile
[2022-11-18T01:31:45.006Z] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/aqa-tests/TKG/output_16687328092528/jdk_lang_0/work/scratch/core.20221117.173134.3595522.0001.dmp
[2022-11-18T01:31:45.006Z] JVMDUMP034I User requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/aqa-tests/TKG/output_16687328092528/jdk_lang_0/work/scratch/javacore.20221117.173137.3595522.0002.txt' through com.ibm.jvm.Dump.javaDumpToFile
[2022-11-18T01:31:45.006Z] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/aqa-tests/TKG/output_16687328092528/jdk_lang_0/work/scratch/javacore.20221117.173137.3595522.0002.txt
[2022-11-18T01:31:45.006Z] Corruption in Evacuate at 00000000FFFB4840: calculated object size 47336334700160 larger than available 309184, Forwarded Header at 00007FF93DB81960
[2022-11-18T01:31:45.006Z] 01:31:38.821 0x1d7f00    j9mm.141    *   ** ASSERTION FAILED ** at ../../../../../../omr/gc/base/standard/Scavenger.cpp:1657: ((false))
[2022-11-18T01:31:45.006Z] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/11/17 17:31:38 - please wait.

[2022-11-18T01:31:45.006Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 967574ms).
[2022-11-18T01:31:45.006Z] --------------------------------------------------
[2022-11-18T01:37:21.877Z] Test results: passed: 846; error: 1
[2022-11-18T01:37:46.882Z] Report written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/aqa-tests/TKG/output_16687328092528/jdk_lang_0/report/html/report.html
[2022-11-18T01:37:46.882Z] Results written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_0/aqa-tests/TKG/output_16687328092528/jdk_lang_0/work
[2022-11-18T01:37:46.882Z] Error: Some tests failed or other problems occurred.
[2022-11-18T01:37:46.882Z] 
[2022-11-18T01:37:46.882Z] jdk_lang_0_FAILED
dmitripivkine commented 1 year ago

I am not sure is it entire issue is duplicate of https://github.com/eclipse-openj9/openj9/issues/16249 but last failure is definitely:

2022-11-18T01:31:45.006Z] Corruption in Evacuate at 00000000FFFB4840: calculated object size 47336334700160 larger than available 309184, Forwarded Header at 00007FF93DB81960

0xFFFB4840 is stale pointer mid-object !j9object 0xFFFB4838:

0xFFFB4830 :  00000000 00000000 00047900 0000000c [ .........y...... ]
0xFFFB4840 :  fffb4848 00000000 0067006d 002d006f [ HH......m.g.o.-. ] <---
0xFFFB4850 :  004d0043 00000000 0004c900 fffb4870 [ C.M.........pH.. ]
0xFFFB4860 :  00000001 9ba2a870 00000000 00000000 [ ....p........... ]

Looking for root of liveVirtualThreadList:

> !findvm
!j9javavm 0x00007FF960030200
> !j9javavm 0x00007FF960030200 | grep liveVirtualThreadList
    0x4f60: struct J9Object** liveVirtualThreadList = !j9x 0x00007FF9600A5A88
    0x4f68: struct J9ThreadMonitor* liveVirtualThreadListMutex = !j9threadmonitor 0x00007FF96007CAD8
> !j9x 0x00007FF9600A5A88
0x7FF9600A5A88 :  00000000ffd8add0 00000000e00dea80 [ ................ ]
0x7FF9600A5A98 :  00000000e00dfae8 00000000e00e0c08 [ ................ ]
> !j9object 0xffd8add0
!J9Object 0x00000000FFD8ADD0 {
    struct J9Class* clazz = !j9class 0x92500 // java/lang/VirtualThread
    Object flags = 0x0000002A;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    J eetop = 0x0000000000000000 (offset = 4) (java/lang/Thread)
    J tid = 0x0000000000000000 (offset = 12) (java/lang/Thread)
    Ljava/lang/String; name = !fj9object 0x0 (offset = 36) (java/lang/Thread)
    Z interrupted = 0x00000000 (offset = 92) (java/lang/Thread)
    Ljava/lang/ClassLoader; contextClassLoader = !fj9object 0x0 (offset = 40) (java/lang/Thread)
    Ljava/security/AccessControlContext; inheritedAccessControlContext = !fj9object 0x0 (offset = 44) (java/lang/Thread)
    Ljava/lang/Thread$FieldHolder; holder = !fj9object 0x0 (offset = 48) (java/lang/Thread)
    Ljava/lang/ThreadLocal$ThreadLocalMap; threadLocals = !fj9object 0x0 (offset = 52) (java/lang/Thread)
    Ljava/lang/ThreadLocal$ThreadLocalMap; inheritableThreadLocals = !fj9object 0x0 (offset = 56) (java/lang/Thread)
    Ljava/lang/Object; extentLocalBindings = !fj9object 0x0 (offset = 60) (java/lang/Thread)
    Ljava/lang/Object; interruptLock = !fj9object 0x0 (offset = 64) (java/lang/Thread)
    Ljava/lang/Object; parkBlocker = !fj9object 0x0 (offset = 68) (java/lang/Thread)
    Lsun/nio/ch/Interruptible; nioBlocker = !fj9object 0x0 (offset = 72) (java/lang/Thread)
    Ljdk/internal/vm/Continuation; cont = !fj9object 0x0 (offset = 76) (java/lang/Thread)
    Ljava/lang/Thread$UncaughtExceptionHandler; uncaughtExceptionHandler = !fj9object 0x0 (offset = 80) (java/lang/Thread)
    J threadLocalRandomSeed = 0x0000000000000000 (offset = 20) (java/lang/Thread)
    I threadLocalRandomProbe = 0x00000000 (offset = 96) (java/lang/Thread)
    I threadLocalRandomSecondarySeed = 0x00000000 (offset = 100) (java/lang/Thread)
    Ljdk/internal/vm/ThreadContainer; container = !fj9object 0x0 (offset = 84) (java/lang/Thread)
    Ljdk/internal/vm/StackableScope; headStackableScopes = !fj9object 0x0 (offset = 88) (java/lang/Thread)
    Z started = 0x00000000 (offset = 104) (java/lang/Thread)
    Z stopCalled = 0x00000000 (offset = 108) (java/lang/Thread)
    J tls = 0x0000000000000000 (offset = 28) (java/lang/Thread) <hidden>
    Ljava/util/concurrent/Executor; scheduler = !fj9object 0x0 (offset = 124) (java/lang/VirtualThread)
    Ljdk/internal/vm/Continuation; cont = !fj9object 0x0 (offset = 128) (java/lang/VirtualThread)
    Ljava/lang/Runnable; runContinuation = !fj9object 0x0 (offset = 132) (java/lang/VirtualThread)
    I state = 0x00000000 (offset = 112) (java/lang/VirtualThread)
    Z parkPermit = 0x00000000 (offset = 152) (java/lang/VirtualThread)
    Ljava/lang/Thread; carrierThread = !fj9object 0x0 (offset = 136) (java/lang/VirtualThread)
    Ljava/util/concurrent/CountDownLatch; termination = !fj9object 0x0 (offset = 140) (java/lang/VirtualThread)
    Ljava/lang/VirtualThread; linkNext = !fj9object 0xfffb4840 (offset = 148) (java/lang/VirtualThread) <hidden> <-----
    Ljava/lang/VirtualThread; linkPrevious = !fj9object 0xfffb4840 (offset = 144) (java/lang/VirtualThread) <hidden> <-----
    J inspectorCount = 0x0000000000000000 (offset = 116) (java/lang/VirtualThread) <hidden>
    I isSuspendedByJVMTI = 0x00000000 (offset = 156) (java/lang/VirtualThread) <hidden>
}

There is list of one element, linkNext and linkPrevious are bogus

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/52 jdk_lang_0 java/lang/Thread/virtual/ParkWithFixedThreadPool.java

Timeout. https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/52/openjdk_test_output.tar.gz

tajila commented 1 year ago

@babsingh can this be closed?

@pshipton I believe the timeout is similar to https://github.com/eclipse-openj9/openj9/issues/16198

babsingh commented 1 year ago

@babsingh can this be closed?

Yes. But I will keep it open for a few more days to confirm that the failure is no longer seen. Fix: https://github.com/eclipse-openj9/openj9/pull/16325. I will close it early next week.

pshipton commented 1 year ago

I've been putting the timeouts here because of the following thread calling notifyJvmtiUnmountBegin, but I can start putting them in https://github.com/eclipse-openj9/openj9/issues/16198 or open a new issue.

3XMTHREADINFO      "pool-1-thread-1" J9VMThread:0x0000000030539300, omrthread_t:0x000001002365C960, java/lang/Thread:0x00000000FFDA8BB8, state:R, prio=0
3XMJAVALTHREAD            (java/lang/Thread getId:0x24, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000E0054EF0)
3XMTHREADINFO1            (native thread ID:0x765024D, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x000000a1)
3XMTHREADINFO2            (native stack address range from:0x0000010023670000, to:0x00000100236FE888, size:0x8E888)
3XMCPUTIME               CPU usage total: 0.001914000 secs, user: 0.001821000 secs, system: 0.000093000 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/VirtualThread.notifyJvmtiUnmountBegin(Native Method)
4XESTACKTRACE                at java/lang/VirtualThread.yieldContinuation(VirtualThread.java:372)
4XESTACKTRACE                at java/lang/VirtualThread.park(VirtualThread.java:504)
4XESTACKTRACE                at java/lang/Access.parkVirtualThread(Access.java:494)
4XESTACKTRACE                at jdk/internal/misc/VirtualThreads.park(VirtualThreads.java:54)
4XESTACKTRACE                at java/util/concurrent/locks/LockSupport.park(LockSupport.java:369)
4XESTACKTRACE                at ParkWithFixedThreadPool$1.run(ParkWithFixedThreadPool.java:59)
4XESTACKTRACE                at java/lang/VirtualThread.run(VirtualThread.java:292)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
4XESTACKTRACE                at java/lang/VirtualThread$VThreadContinuation$$Lambda$88/0x000000002361fb58.run(Bytecode PC:8)
4XESTACKTRACE                at jdk/internal/vm/Continuation.execute(Continuation.java:171)
pshipton commented 1 year ago

Created a new issue for the timeouts, https://github.com/eclipse-openj9/openj9/issues/16340

babsingh commented 1 year ago

As per https://github.com/eclipse-openj9/openj9/issues/16259#issuecomment-1320371045, closing this issue since the failure has not been seen again.