eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

IBM Semeru 18.0.1 crashes in libj9gc29.so #15017

Open quijote opened 2 years ago

quijote commented 2 years ago

Java -version output

openjdk version "18.0.1" 2022-04-19
IBM Semeru Runtime Open Edition 18.0.1.0 (build 18.0.1+10)
Eclipse OpenJ9 VM 18.0.1.0 (build openj9-0.32.0, JRE 18 Linux amd64-64-Bit Compressed References 20220422_62 (JIT enabled, AOT enabled)
OpenJ9   - 9a84ec34e
OMR      - ab24b6666
JCL      - b577b89345e based on jdk-18.0.1+10)

Summary of problem

We observed crashes in some of our test applications when executed with IBM Semeru 18.0.1. Those tests are running together with our JVMTI agent.

I observed the following message twice so far:

Unhandled exception

Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001

Handler1=00007F4FB0AF3290 Handler2=00007F4FB0851150 InaccessibleAddress=0000000000000000
RDI=00007F4FAC05FB10 RSI=0000000000000007 RAX=0000000000000000 RBX=00000000F00D1E90
RCX=0000000000000000 RDX=0000000099669966 R8=00007F4F2FFFE5F0 R9=00007F4F2FFFE5F0
R10=0000000000000006 R11=0000000000000000 R12=00007F4F2FFFE200 R13=00007F4F2FFFE030
R14=00007F4FAA051340 R15=00007F4E74002C48
RIP=00007F4FA9D23BBD GS=0000 FS=0000 RSP=00007F4F2FFFE030
EFlags=0000000000010246 CS=0033 RBP=00007F4F2FFFE440 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000001000 CR2=0000000000000000
xmm0 0000003000000020 (f: 32.000000, d: 1.018558e-312)
xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3 000000004153c21e (f: 1096008192.000000, d: 5.415000e-315)
xmm4 000000003eccccd5 (f: 1053609152.000000, d: 5.205521e-315)
xmm5 0000000039347648 (f: 959739456.000000, d: 4.741743e-315)
xmm6 000000004141ecbc (f: 1094839424.000000, d: 5.409226e-315)
xmm7 000000003d884dc2 (f: 1032342976.000000, d: 5.100452e-315)
xmm8 242d643a00080e00 (f: 527872.000000, d: 2.021873e-134)
xmm9 2a3d032500180900 (f: 1575168.000000, d: 3.162448e-105)
xmm10 2eec225f00110700 (f: 1115904.000000, d: 1.158582e-82)
xmm11 3ada3f4600190500 (f: 1639680.000000, d: 3.392366e-25)
xmm12 3b3b0af000100700 (f: 1050368.000000, d: 2.236922e-23)
xmm13 491e6e48001a0500 (f: 1705216.000000, d: 1.696573e+44)
xmm14 4dff21d5000c0500 (f: 787712.000000, d: 5.245758e+67)
xmm15 4ff688cb000e0d00 (f: 920832.000000, d: 1.630812e+77)
Module=/home/jenkins/agent/oa-master/agent/test/java/build/tmp/caches/jdk-ibm-semeru_64-18.0.1.10.0/lib/default/libj9gc29.so
Module_base_address=00007F4FA9B8A000
Target=2_90_20220422_62 (Linux 5.11.0-36-generic)
CPU=amd64 (40 logical CPUs) (0x3ef38ec000 RAM)
----------- Stack Backtrace -----------
_Z27concurrentStackSlotIteratorP8J9JavaVMPP8J9ObjectPvP16J9StackWalkStatePKv+0x39d (0x00007F4FA9D23BBD [libj9gc29.so+0x199bbd])
jitWalkFrame+0x16b (0x00007F4FAA9A4D2B [libj9jit29.so+0x951d2b])
jitWalkStackFrames+0xddc (0x00007F4FAA9A611C [libj9jit29.so+0x95311c])
walkStackFrames+0xb3 (0x00007F4FB0B33483 [libj9vm29.so+0x80483])
_ZN28GC_VMThreadStackSlotIterator9scanSlotsEP10J9VMThreadS1_PvPFvP8J9JavaVMPP8J9ObjectS2_P16J9StackWalkStatePKvEbb+0x76 (0x00007F4FA9BCC8A6 [libj9gc29.so+0x428a6])
_ZN28MM_ConcurrentMarkingDelegate15scanThreadRootsEP18MM_EnvironmentBase+0x1ac (0x00007F4FA9D241BC [libj9gc29.so+0x19a1bc])
_ZN15MM_ConcurrentGC10scanThreadEP18MM_EnvironmentBase+0x61 (0x00007F4FA9CE6301 [libj9gc29.so+0x15c301])
memorySubSpaceAsyncCallbackHandler+0x6d (0x00007F4FA9CB325D [libj9gc29.so+0x12925d])
dispatchAsyncEvents+0xc7 (0x00007F4FB0AF4887 [libj9vm29.so+0x41887])
javaCheckAsyncMessages+0xa8 (0x00007F4FB0ACADE8 [libj9vm29.so+0x17de8])
old_slow_jitCheckAsyncMessages+0x85 (0x00007F4FAA991475 [libj9jit29.so+0x93e475])
 (0x00007F4FAA9A89F6 [libj9jit29.so+0x9559f6])

The small diagnostic files for those 2 crashes are attached. If necessary I may be able to provide core dumps as well.

A third crash did not create any diagnostic files and only printed the following:

Thread 000000000038AF00 structures scan: slot 00007F73941A8CF8 has bad value FFFFFFFFFFFFFFFF, iterator state 2
java: ../../../../../../openj9/runtime/gc_glue_java/ScavengerRootScanner.hpp:123: virtual void MM_ScavengerRootScanner

Not sure if this is related or a separate issue.

Diagnostic files

diagnostic_files.zip

dmitripivkine commented 2 years ago

@quijote Would you please provide system core(s)?

All crashes occurred because of bad O-slot in java thread stack frames. For first two crashes we know there are JIT frames.

I can triage this problem and provide more details about it. Adding JIT component label. @0xdaryl FYI

quijote commented 2 years ago

I sent you the core dumps by email now (@ca.ibm.com). Let me know if this did not work.

dmitripivkine commented 2 years ago

I have received two cores uploaded to my /team/Dmitri/15017/ folder. Both of the cores have exactly the same problematic pattern - O-slot in JIT Frame for com/dynatrace/agent/casp/thirdparty/ComponentClassifier.isIgnoredClass(Lcom/dynatrace/agent/casp/thirdparty/wrapper/ClassWrapper;)Z points inside (the same) object in Tenure. It is possible (speculation) wrong offset is a result of disregard of expanded header size for Indexable object (and in this case an intention was to point to next object).

For core.20220505.093317.70573.0001.dmp:

Failing Thread: !j9vmthread 0x396c00
Failing Thread ID: 0x11403 (70659)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F4FB0AF3290 Handler2=00007F4FB0851150 InaccessibleAddress=0000000000000000
RDI=00007F4FAC05FB10 RSI=0000000000000007 RAX=0000000000000000 RBX=00000000F00D1E90
RCX=0000000000000000 RDX=0000000099669966 R8=00007F4F2FFFE5F0 R9=00007F4F2FFFE5F0
R10=0000000000000006 R11=0000000000000000 R12=00007F4F2FFFE200 R13=00007F4F2FFFE030
R14=00007F4FAA051340 R15=00007F4E74002C48
RIP=00007F4FA9D23BBD GS=0000 FS=0000 RSP=00007F4F2FFFE030
EFlags=0000000000010246 CS=0033 RBP=00007F4F2FFFE440 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000001000 CR2=0000000000000000
Module=/home/jenkins/agent/oa-master/agent/test/java/build/tmp/caches/jdk-ibm-semeru_64-18.0.1.10.0/lib/default/libj9gc29.so
Module_base_address=00007F4FA9B8A000

+----------------+----------------+----------------+----------------+--------+----------------+----------------------
|    region      |     start      |      end       |    subspace    | flags  |      size      |      region type
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
 00007f4fac1bd9c0 00000000f0000000 00000000f0700000 00007f4fac1a9840 00000009           700000 ADDRESS_ORDERED
 00007f4fac1bc1f0 00000000ffda0000 00000000fff00000 00007f4fac1b54c0 0000000a           160000 ADDRESS_ORDERED
 00007f4fac1baa20 00000000fff00000 0000000100000000 00007f4fac1b0260 0000000a           100000 ADDRESS_ORDERED
+----------------+----------------+----------------+----------------+--------+----------------+----------------------

Checking THREAD STACKS...  <gc check (1): from debugger: THREAD STACKS: slot 396c00(7f4f2fffe5f0) -> f00d1e90: class pointer is null>

0xF00D1E80 :  00068500 00000002 f00d1e90 00000000 [ ................ ]
0xF00D1E90 :  0000003e 00000000 0006d900 1e01a3d6 [ >............... ] <-------
0xF00D1EA0 :  00000000 00000001 ba6fb858 00000000 [ ........X.o..... ]
0xF00D1EB0 :  00068500 0000000c f00d1ec0 00000000 [ ................ ]
0xF00D1EC0 :  0061006a 00610076 002e0078 00000000 [ j.a.v.a.x....... ]

0xf00d1e90 pointed inside object

> !j9object 0xF00D1E80
!J9IndexableObject 0x00000000F00D1E80 {
    struct J9Class* clazz = !j9arrayclass 0x68500   // [B
    Object flags = 0x00000000;
    U_32 size = 0x00000002;
    [0] =  62, 0x3e
    [1] =   0, 0x00
}

next object:

> !j9object 0xF00D1E98
J9VMJavaLangString at 0x00000000F00D1E98 {
struct J9Class* clazz = !j9class 0x6D900 // java/lang/String
Object flags = 0x00000000;
[B value = !fj9object 0x1e01a3d6 (offset = 0) (java/lang/String)
B coder = 0x00000001 (offset = 8) (java/lang/String)
I hash = 0xBA6FB858 (offset = 12) (java/lang/String)
Z hashIsZero = 0x00000000 (offset = 16) (java/lang/String)
Lcom/dynatrace/agent/introspection/casp/flawfinder/store/TaintInfo; dynatraceTaintInfo = !fj9object 0x0 (offset = 4) (java/lang/String)
"javax."
}

Problematic JIT frame:

<396c00> JIT frame: bp = 0x00000000003BA3B8, pc = 0x00007F4F9042D043, unwindSP = 0x00000000003BA340, cp = 0x000000000037EDC0, arg0EA = 0x00000000003BA3C8, jitInfo = 0x00007F4F8805F1A8
<396c00>    Method: com/dynatrace/agent/casp/thirdparty/ComponentClassifier.isIgnoredClass(Lcom/dynatrace/agent/casp/thirdparty/wrapper/ClassWrapper;)Z !j9method 0x000000000037FBE8
<396c00>    Bytecode index = 11, inlineDepth = 0, PC offset = 0x00000000000020C3
<396c00>    stackMap=0x00007F4F8805FF74, slots=I16(0x0002) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0002), localBaseOffset=I16(0xFFC8)
<396c00>    Described JIT args starting at 0x00000000003BA3C0 for U16(0x0002) slots
<396c00>        I-Slot: : a1[0x00000000003BA3C0] = 0x00000000F061D940
<396c00>        O-Slot: : a0[0x00000000003BA3C8] = 0x00000000F013B6B8
<396c00>    Described JIT temps starting at 0x00000000003BA380 for IDATA(0x0000000000000007) slots
<396c00>        I-Slot: : t6[0x00000000003BA380] = 0x0000000000000000
<396c00>        I-Slot: : t5[0x00000000003BA388] = 0x00000000FFFB6178
<396c00>        I-Slot: : t4[0x00000000003BA390] = 0x0000000000300000
<396c00>        I-Slot: : t3[0x00000000003BA398] = 0x00000000F013B6B8
<396c00>        I-Slot: : t2[0x00000000003BA3A0] = 0x0000000010000000
<396c00>        I-Slot: : t1[0x00000000003BA3A8] = 0x00007F4FAA9B2920
<396c00>        I-Slot: : t0[0x00000000003BA3B0] = 0x00007F4F903FF105
<396c00>    JIT-RegisterMap = UDATA(0x0000000000001027)
<396c00>        JIT-RegisterMap-O-Slot[0x00007F4F2FFFE5E0] = 0x00000000F013B6B8 (jit_rax)
<396c00>        JIT-RegisterMap-O-Slot[0x00007F4F2FFFE5E8] = 0x00000000FFDA5A00 (jit_rbx)
<396c00>        JIT-RegisterMap-O-Slot[0x00007F4F2FFFE5F0] = 0x00000000F00D1E90 (jit_rcx) <----------------------
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE5F8] = UDATA(0x000000001E0C3B36) (jit_rdx)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE600] = UDATA(0x0000000000000000) (jit_rdi)
<396c00>        JIT-RegisterMap-O-Slot[0x00007F4F2FFFE608] = 0x00000000FFDA5A18 (jit_rsi)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE610] = UDATA(0x00007F4F2FFFE740) (jit_rbp)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE618] = UDATA(0x00007F4FAC7E3AB0) (jit_rsp)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE620] = UDATA(0x00000000F00D1E80) (jit_r8)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE628] = UDATA(0x00000000EA6081DC) (jit_r9)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE630] = UDATA(0x00000000FFFFFFFF) (jit_r10)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE638] = UDATA(0x000000001E02D3F8) (jit_r11)
<396c00>        JIT-RegisterMap-O-Slot[0x00007F4F2FFFE640] = 0x00000000F061D9B0 (jit_r12)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE648] = UDATA(0x00000000FFFB6198) (jit_r13)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE650] = UDATA(0x00000000FFFB61A0) (jit_r14)
<396c00>        JIT-RegisterMap-I-Slot[0x00007F4F2FFFE658] = UDATA(0x00000000F0000000) (jit_r15)
<396c00>    JIT-Frame-RegisterMap[0x00000000003BA368] = UDATA(0x00000000FFFB6178) (jit_rbx)
<396c00>    JIT-Frame-RegisterMap[0x00000000003BA370] = UDATA(0x0000000000000010) (jit_r9)
<396c00>    JIT-Frame-RegisterMap[0x00007F4F2FFFE630] = UDATA(0x00000000FFFFFFFF) (jit_r10)
<396c00>    JIT-Frame-RegisterMap[0x00007F4F2FFFE638] = UDATA(0x000000001E02D3F8) (jit_r11)
<396c00>    JIT-Frame-RegisterMap[0x00007F4F2FFFE640] = UDATA(0x00000000F061D9B0) (jit_r12)
<396c00>    JIT-Frame-RegisterMap[0x00007F4F2FFFE648] = UDATA(0x00000000FFFB6198) (jit_r13)
<396c00>    JIT-Frame-RegisterMap[0x00007F4F2FFFE650] = UDATA(0x00000000FFFB61A0) (jit_r14)
<396c00>    JIT-Frame-RegisterMap[0x00007F4F2FFFE658] = UDATA(0x00000000F0000000) (jit_r15)
dmitripivkine commented 2 years ago

For core.20220504.140827.18093.0001.dmp - there is bad O-slot in JIT frame for `com/dynatrace/agent/casp/thirdparty/ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix(Lcom/dynatrace/agent/casp/thirdparty/wrapper/ClassWrapper;)Z:

Failing Thread: !j9vmthread 0x396c00
Failing Thread ID: 0x4703 (18179)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FBFD0993290 Handler2=00007FBFD06F1150 InaccessibleAddress=0000000000000000
RDI=00007FBFCC066060 RSI=0000000000000007 RAX=0000000000000000 RBX=00000000F00DABD8
RCX=0000000000000000 RDX=0000000099669966 R8=00007FBF52FFE5F0 R9=00007FBF52FFE5F0
R10=0000000040000000 R11=0000000000000000 R12=00007FBF52FFE200 R13=00007FBF52FFE030
R14=00007FBFCA051340 R15=00007FBE94002C48
RIP=00007FBFC9D23BBD GS=0000 FS=0000 RSP=00007FBF52FFE030
EFlags=0000000000010246 CS=0033 RBP=00007FBF52FFE440 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000001000 CR2=0000000000000000
Module=/home/jenkins/agent/oa-master/agent/test/java/build/tmp/caches/jdk-ibm-semeru_64-18.0.1.10.0/lib/default/libj9gc29.so
Module_base_address=00007FBFC9B8A000

Checking THREAD STACKS...  <gc check (1): from debugger: THREAD STACKS: slot 396c00(7fbf52ffe5f0) -> f00dabd8: class pointer is null>

+----------------+----------------+----------------+----------------+--------+----------------+----------------------
|    region      |     start      |      end       |    subspace    | flags  |      size      |      region type
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
 00007fbfcc1c3f40 00000000f0000000 00000000f0830000 00007fbfcc1afdc0 00000009           830000 ADDRESS_ORDERED
 00007fbfcc1c2770 00000000ffd60000 00000000ffde0000 00007fbfcc1bba40 0000000a            80000 ADDRESS_ORDERED
 00007fbfcc1c0fa0 00000000ffde0000 0000000100000000 00007fbfcc1b67e0 0000000a           220000 ADDRESS_ORDERED
+----------------+----------------+----------------+----------------+--------+----------------+----------------------

0xF00DABB0 :  0006d900 1e01b579 00000000 00000001 [ ....y........... ]
0xF00DABC0 :  0000002e 00000000 00068500 00000002 [ ................ ]
0xF00DABD0 :  f00dabd8 00000000 0000003e 00000000 [ ........>....... ] <--- 0xf00dabd8
0xF00DABE0 :  0006d900 1e01b57f 00000000 00000001 [ ................ ]
0xF00DABF0 :  ba6fb858 00000000 00068500 0000000c [ X.o............. ]

> !j9object 0xF00DABC8
!J9IndexableObject 0x00000000F00DABC8 {
    struct J9Class* clazz = !j9arrayclass 0x68500   // [B
    Object flags = 0x00000000;
    U_32 size = 0x00000002;
    [0] =  62, 0x3e
    [1] =   0, 0x00
}

> !j9object 0xF00DABE0
J9VMJavaLangString at 0x00000000F00DABE0 {
struct J9Class* clazz = !j9class 0x6D900 // java/lang/String
Object flags = 0x00000000;
[B value = !fj9object 0x1e01b57f (offset = 0) (java/lang/String)
B coder = 0x00000001 (offset = 8) (java/lang/String)
I hash = 0xBA6FB858 (offset = 12) (java/lang/String)
Z hashIsZero = 0x00000000 (offset = 16) (java/lang/String)
Lcom/dynatrace/agent/introspection/casp/flawfinder/store/TaintInfo; dynatraceTaintInfo = !fj9object 0x0 (offset = 4) (java/lang/String)
"javax."
}

<396c00> JIT frame: bp = 0x00000000003BA3F8, pc = 0x00007FBFB00743B2, unwindSP = 0x00000000003BA370, cp = 0x000000000037EDC0, arg0EA = 0x00000000003BA408, jitInfo = 0x00007FBFA801CDA8
<396c00>    Method: com/dynatrace/agent/casp/thirdparty/ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix(Lcom/dynatrace/agent/casp/thirdparty/wrapper/ClassWrapper;)Z !j9method 0x000000000037FC28
<396c00>    Bytecode index = 19, inlineDepth = 0, PC offset = 0x0000000000000DF2
<396c00>    stackMap=0x00007FBFA801D522, slots=I16(0x0002) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0002), localBaseOffset=I16(0xFFC8)
<396c00>    Described JIT args starting at 0x00000000003BA400 for U16(0x0002) slots
<396c00>        O-Slot: : a1[0x00000000003BA400] = 0x00000000FFF7B400
<396c00>        O-Slot: : a0[0x00000000003BA408] = 0x00000000F014CEF0
<396c00>    Described JIT temps starting at 0x00000000003BA3C0 for IDATA(0x0000000000000007) slots
<396c00>        I-Slot: : t6[0x00000000003BA3C0] = 0x00000000FFF0A898
<396c00>        I-Slot: : t5[0x00000000003BA3C8] = 0x00000000F0139010
<396c00>        I-Slot: : t4[0x00000000003BA3D0] = 0x00000000F0138620
<396c00>        I-Slot: : t3[0x00000000003BA3D8] = 0x00007FBFCA9B29B0
<396c00>        I-Slot: : t2[0x00000000003BA3E0] = 0x00000000FFF167E8
<396c00>        I-Slot: : t1[0x00000000003BA3E8] = 0x0000000000000000
<396c00>        I-Slot: : t0[0x00000000003BA3F0] = 0x000000000037FBE8
<396c00>    JIT-RegisterMap = UDATA(0x0000000000000904)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE5E0] = UDATA(0x00000000FFF367F8) (jit_rax)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE5E8] = UDATA(0x00000000F00DABC8) (jit_rbx)
<396c00>        JIT-RegisterMap-O-Slot[0x00007FBF52FFE5F0] = 0x00000000F00DABD8 (jit_rcx) <-----------------
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE5F8] = UDATA(0x0000000000000018) (jit_rdx)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE600] = UDATA(0x0000000000000000) (jit_rdi)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE608] = UDATA(0x0000000000000001) (jit_rsi)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE610] = UDATA(0x00007FBF52FFE740) (jit_rbp)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE618] = UDATA(0x00007FBFCC801F30) (jit_rsp)
<396c00>        JIT-RegisterMap-O-Slot[0x00007FBF52FFE620] = 0x00000000FFF7B410 (jit_r8)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE628] = UDATA(0x0000000000000018) (jit_r9)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE630] = UDATA(0x00000000FFFFFFFF) (jit_r10)
<396c00>        JIT-RegisterMap-O-Slot[0x00007FBF52FFE638] = 0x00000000F075E788 (jit_r11)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE640] = UDATA(0x0000000000000004) (jit_r12)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE648] = UDATA(0x00000000FFF167E8) (jit_r13)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE650] = UDATA(0x00000000FFF16800) (jit_r14)
<396c00>        JIT-RegisterMap-I-Slot[0x00007FBF52FFE658] = UDATA(0x00000000FFFFFFFF) (jit_r15)
<396c00>    JIT-Frame-RegisterMap[0x00000000003BA398] = UDATA(0x00000000BDF7880F) (jit_rbx)
<396c00>    JIT-Frame-RegisterMap[0x00000000003BA3A0] = UDATA(0x00000000FFF0A898) (jit_r9)
<396c00>    JIT-Frame-RegisterMap[0x00007FBF52FFE630] = UDATA(0x00000000FFFFFFFF) (jit_r10)
<396c00>    JIT-Frame-RegisterMap[0x00007FBF52FFE638] = UDATA(0x00000000F075E788) (jit_r11)
<396c00>    JIT-Frame-RegisterMap[0x00007FBF52FFE640] = UDATA(0x0000000000000004) (jit_r12)
<396c00>    JIT-Frame-RegisterMap[0x00007FBF52FFE648] = UDATA(0x00000000FFF167E8) (jit_r13)
<396c00>    JIT-Frame-RegisterMap[0x00007FBF52FFE650] = UDATA(0x00000000FFF16800) (jit_r14)
<396c00>    JIT-Frame-RegisterMap[0x00007FBF52FFE658] = UDATA(0x00000000FFFFFFFF) (jit_r15)
<396c00> I2J values: PC = 0x00007FBFCD5C73B2, A0 = 0x00000000003BA438, walkSP = 0x00000000003BA418, literals = 0x000000000037FBE8, JIT PC = 0x00007FBFCA9B29B0, pcAddress = 0x00007FBF52FFE808, decomp = 0x0000000000000000
0xdaryl commented 2 years ago

@a7ehuo : could you investigate this issue please?

a7ehuo commented 2 years ago

I looked at the core from 18093 since the jitdmp from 70573 is incomplete. However, the jitdmp for 18093 shows different inlining from when it was crashed. jitdmp.18093 doesn't have java/lang/StringUTF16.getChars([BII[CI)V inlined, but javacore and stackslots show the top JIT frame is inlined frame java/lang/StringUTF16.getChars([BII[CI)V when the crash happened. The incomplete jitdmp.70573 has java/lang/StringUTF16.getChars([BII[CI)V inlined but it lacks of generated code.

The bad object (f00dabd8) on the JIT frame existed when jitCheckAsyncMessages helper call is called (pc = 0x00007BFB00743B2) [1]. They look like out-of-line instructions. The inlined java/lang/StringUTF16.getChars([BII[CI)V from jitdmp.70573 shows that the aynchcheck is inserted at the end of the loop that stores values from byte[] value into char dst[] [2].

[1]

0x7fbfb0074378 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3512   7:41   e80342931a           call      0x7fbfca9a8580 <----- jitNewArray

0x7fbfb007437d {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3517          4d8bde               mov       r11, r14
0x7fbfb0074380 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3520          4c8b742470           mov       r14, qword ptr [rsp + 0x70]
0x7fbfb0074385 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3525          498bff               mov       rdi, r15
0x7fbfb0074388 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3528          498bd6               mov       rdx, r14
0x7fbfb007438b {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3531          498bcd               mov       rcx, r13
0x7fbfb007438e {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3534          4887c2               xchg      rdx, rax
0x7fbfb0074391 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3537          488bc2               mov       rax, rdx
0x7fbfb0074394 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3540          e93cf6ffff           jmp       0x7fbfb00739d5 U>> ^+1045 

0x7fbfb0074399 GCMap  Bytecode 11:78 Registers: r11 r14  StackMap: P0x90 P0x98 
0x7fbfb0074399 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3545  11:78   e85245931a           call      0x7fbfca9a88f0 <<< +807 <----- jitCheckAsyncMessages
0x7fbfb007439e {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3550          e94af5ffff           jmp       0x7fbfb00738ed U>> ^+813 

0x7fbfb00743a3 GCMap  Bytecode 36:25 Registers: r9 r11 r14  StackMap: P0x90 P0x98 
0x7fbfb00743a3 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3555  36:25   e84845931a           call      0x7fbfca9a88f0 <<< +767 <----- jitCheckAsyncMessages
0x7fbfb00743a8 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3560          e918f5ffff           jmp       0x7fbfb00738c5 U>> ^+773 

0x7fbfb00743ad GCMap  Bytecode 23:38 Registers: rcx r8 r11  StackMap: P0x90 P0x98 
0x7fbfb00743ad {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3565  23:38   e83e45931a           call      0x7fbfca9a88f0 <<< +549  <----- jitCheckAsyncMessages
0x7fbfb00743b2 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3570          e934f4ffff           jmp       0x7fbfb00737eb U>> ^+555 <----- pc

0x7fbfb00743b7 {com/.../ComponentClassifier.doesClassNameStartWithIgnoredPackagePrefix} +3575          4d8beb               mov       r13, r11 <<< +486

[2]

<396c00> JIT inline frame: bp = 0x00000000003BA3F8, pc = 0x00007FBFB00743B2, unwindSP = 0x00000000003BA370, cp = 0x00000000000F8190, arg0EA = 0x0000000000000000, jitInfo = 0x00007FBFA801CDA8
<396c00>    Method: java/lang/StringUTF16.getChars([BII[CI)V !j9method 0x00000000000F78D0
<396c00>    Bytecode index = 38, inlineDepth = 4, PC offset = 0x000000000000110A
    0 iload1 
    1 iload2 
    2 ificmpge 13
    5 iload1 
    6 iload2 
    7 iload1 
    8 isub 
    9 aload0 
   10 invokestatic 42 java/lang/StringUTF16.checkBoundsOffCount(II[B)V
   13 iload1 
   14 istore 5
   16 iload 5
   18 iload2 
   19 ificmpge 41
   22 aload3 
   23 iload 4
   25 iinc 4 1
   28 aload0 
   29 iload 5
   31 invokestatic 28 java/lang/StringUTF16.getChar([BI)C
   34 castore 
   35 iinc 5 1
   38 goto 16 <-- asynccheck at the end of the loop
   41 return0
n3622n    astore  <pending push temp 0>[#797  Auto] [flags 0x7 0x800 ]                        [0x7f4f4ab44be0] bci=[28,31,256] rc=0 vc=371 vn=- li=- udi=- nc=1
n3611n      aload  dst<auto slot 1>[#781  Auto] [flags 0x7 0x0 ]   
...
...
n3637n    sstorei  <array-shadow>[#227  Shadow] [flags 0x80000602 0x0 ]                       [0x7f4f4ab45090] bci=[28,34,256] rc=0 vc=371 vn=- li=- udi=- nc=2
n3636n      aladd (internalPtr sharedMemory )                                                 [0x7f4f4ab45040] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=2 flg=0x8000
n3611n        ==>aload
n3635n        ladd                                                                            [0x7f4f4ab44ff0] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=2
n3633n          lshl                                                                          [0x7f4f4ab44f50] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=2
n3632n            i2l (X>=0 )                                                                 [0x7f4f4ab44f00] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=1 flg=0x100
n3612n              ==>iload
n3631n            iconst 1                                                                    [0x7f4f4ab44eb0] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=0
n3634n          lconst 16                                                                     [0x7f4f4ab44fa0] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=0
n3627n      i2s                                                                               [0x7f4f4ab44d70] bci=[28,34,256] rc=1 vc=371 vn=- li=- udi=- nc=1
n3620n        ==>su2i
n3641n    istore  i<auto slot 5>[#793  Auto] [flags 0x3 0x0 ]                                 [0x7f4f4ab451d0] bci=[28,35,255] rc=0 vc=371 vn=- li=- udi=- nc=1
n3640n      iadd                                                                              [0x7f4f4ab45180] bci=[28,35,255] rc=1 vc=371 vn=- li=- udi=- nc=2
n3638n        iload  i<auto slot 5>[#793  Auto] [flags 0x3 0x0 ]                              [0x7f4f4ab450e0] bci=[28,35,255] rc=1 vc=371 vn=- li=- udi=- nc=0
n3639n        iconst 1                                                                        [0x7f4f4ab45130] bci=[28,35,255] rc=1 vc=371 vn=- li=- udi=- nc=0
n3642n    asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [0x7f4f4ab45220] bci=[28,38,255] rc=0 vc=371 vn=- li=- udi=- nc=0
n3643n    goto --> block_583 BBStart at n3586n                                                [0x7f4f4ab45270] bci=[28,38,255] rc=0 vc=371 vn=- li=- udi=- nc=0
n3609n    BBEnd </block_582> =====  
0xdaryl commented 1 year ago

We won't be able to get to the bottom of this for 0.35. Moving to 0.36.

a7ehuo commented 1 year ago

@quijote Do you still see this issue? Is there a way for us to reproduce it locally?

quijote commented 1 year ago

We disabled our tests for affected OpenJ9 JVMs for now but yesterday I rolled out J9 0.35 (8, 11, 17) in our CI. Are there any plans to have Java 18 with J9 0.35 as well? Or maybe 19? Did not find them yet via https://developer.ibm.com/languages/java/semeru-runtimes/downloads/.

Currently I don't have a way to reproduce this deterministically. We run thousands of tests each day and observed crashes like this occasionally. We will runs our tests with all new releases of J9 again. I could provide additional dumps or try setting some VM flags if this helps with debugging the problem (and does not cause significant overhead).

a7ehuo commented 1 year ago

Are there any plans to have Java 18 with J9 0.35 as well? Or maybe 19?

I saw the announcement on Monday that OpenJ9 0.35.0 is available for jdk8, 11, 17. @pshipton I'm wondering if you know there might be 0.35.0 for jdk18 or jdk19. Thanks!

pshipton commented 1 year ago

There won't be 0.35 for jdk18 or jdk19. jdk18 is out of support now, and jdk19 isn't ready yet. We are trying for jdk19 with 0.36 (in January), but not sure it's going to make it. If it doesn't make it, we'll have to decide whether to have a special jdk19 release afterwards or skip jdk19 and work on jdk20.

There are nightly builds for jdk19 if you want to try it. It's work in progress and there are known problems, but depending what you are running you may not hit them.

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-JDK19/ Select a build date, click on "build - passing" beside "build" for the platform you want, and there is a download link for the JDK at the top.

quijote commented 1 year ago

Thank you for the insights. Nightly builds are too risky for me to run in our CI. I will wait for the next official release. Feel free to close this ticket if the provided information is not enough to make a change. And thank you for having a look at those crashes. I understand well that getting a crash report without a reproducer can be painful to analyze.

a7ehuo commented 1 year ago

Thank you for your understanding! The jitdmp logs show different methods from com/dynatrace/agent/casp/thirdparty/ComponentClassifier associated with the two crashes. I suspect it could be any other method. Tracing all methods from ComponentClassifier like below might add a bit overhead. I'll keep the issue open for now in case we find more to narrow down the crash

'-Xjit:{com/dynatrace/agent/casp/thirdparty/ComponentClassifier.*}(traceFull,traceILGen,traceCG,log=/tmp/jitlog)'

0xdaryl commented 1 year ago

Moving to 0.38 given the above discussion.

0xdaryl commented 1 year ago

No change in status. Moving out of 0.38.

hzongaro commented 10 months ago

Moving this to the Backlog for now, as it's been difficult to make a lot of progress investigating the problem.