Open JasonFengJ9 opened 4 years ago
this crash occur because of bogus O-slot in jit frame:
<1740b00> JIT frame: bp = 0x0000000001740A18, pc = 0x00007FF8BE6820AD, unwindSP = 0x0000000001740920, cp = 0x00000000016F67B0, arg0EA = 0x0000000001740A20, jitInfo = 0x0000000013BF7168
<1740b00> Method: net/adoptopenjdk/test/nio/NioBuffersTest$Buffers$BufferThread.run()V !j9method 0x00000000016F5F98
<1740b00> Bytecode index = 2,317, inlineDepth = 0, PC offset = 0x000000000000576D
<1740b00> stackMap=0x0000000013BFAA7F, slots=I16(0x0001) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0001), localBaseOffset=I16(0xFF78)
<1740b00> Described JIT args starting at 0x0000000001740A20 for U16(0x0001) slots
<1740b00> O-Slot: : a0[0x0000000001740A20] = 0x00000000FFC1F758
<1740b00> Described JIT temps starting at 0x0000000001740990 for IDATA(0x0000000000000011) slots
<1740b00> O-Slot: : t16[0x0000000001740990] = 0x0000000000000000
<1740b00> O-Slot: : t15[0x0000000001740998] = 0x0000000000000000
<1740b00> O-Slot: : t14[0x00000000017409A0] = 0x0000000000000000
<1740b00> O-Slot: : t13[0x00000000017409A8] = 0x0000000000000000
<1740b00> O-Slot: : t12[0x00000000017409B0] = 0x0000000000000000
<1740b00> O-Slot: : t11[0x00000000017409B8] = 0x0000000000000000
<1740b00> O-Slot: : t10[0x00000000017409C0] = 0x0000000000000000
<1740b00> O-Slot: : t9[0x00000000017409C8] = 0x0000000000000000
<1740b00> O-Slot: : t8[0x00000000017409D0] = 0x0000000000000000
<1740b00> O-Slot: : t7[0x00000000017409D8] = 0x0000000000000000
<1740b00> O-Slot: : t6[0x00000000017409E0] = 0x00000000FFC1C058
<1740b00> O-Slot: : t5[0x00000000017409E8] = 0x00000000FFC1F758
<1740b00> O-Slot: : t4[0x00000000017409F0] = 0x00000000FFC1C090
<1740b00> O-Slot: : t3[0x00000000017409F8] = 0x00000000FFC1C058
<1740b00> O-Slot: : t2[0x0000000001740A00] = 0x00000000FFC1C0C8
<1740b00> O-Slot: : t1[0x0000000001740A08] = 0x00000000FFE16400 <--- bad value crashed at, address in R14
<1740b00> O-Slot: : t0[0x0000000001740A10] = 0x00000000FFEA7F78
<1740b00> JIT-RegisterMap = UDATA(0x0000000000000000)
<1740b00> JIT-Frame-RegisterMap[0x0000000001740940] = UDATA(0x00000000016F7400) (jit_rbx)
<1740b00> JIT-Frame-RegisterMap[0x0000000001740948] = UDATA(0x0000000000000011) (jit_r9)
<1740b00> JIT-Frame-RegisterMap[0x000000001506F820] = UDATA(0x00000000000BB3D8) (jit_r10)
<1740b00> JIT-Frame-RegisterMap[0x000000001506F828] = UDATA(0x0000000000000000) (jit_r11)
<1740b00> JIT-Frame-RegisterMap[0x000000001506F830] = UDATA(0x00000000000BB3E0) (jit_r12)
<1740b00> JIT-Frame-RegisterMap[0x000000001506F838] = UDATA(0x00000000FFD44B08) (jit_r13)
<1740b00> JIT-Frame-RegisterMap[0x000000001506F840] = UDATA(0x0000000000000002) (jit_r14)
<1740b00> JIT-Frame-RegisterMap[0x000000001506F848] = UDATA(0x00000000803F5890) (jit_r15)
this problematic address 0xFFE16400
points at the middle of zero filled [J
array at 0xFFD95998
size 0x1767B
ends at 0xFFE50D78
(currently forwarded to !j9object 0x806d8250
)
GC check discovered the same bogus address in Monitor table too:
Checking MONITOR TABLE... <gc check (1): from debugger: MONITOR TABLE: slot db9e830(d79498) -> ffe16400: corrupt data exception>
Checking THREAD STACKS... <gc check (2): from debugger: THREAD STACKS: slot 1740b00(1740a08) -> ffe16400: corrupt data exception>
Passing investigation to JIT for now. Please let me know if I can help.
@andrewcraik FYI
@a7ehuo and @liqunl maybe this is a manifestation of some kind of post-GRA manipulation issue? Seems like something we could get wrong, but haven't opened the core to see if there are signs.
It's a problem of missing gc map somewhere. Temp slot t0
and t1
are pointing to old nursery space. t1
stores a monitored object.
Have to look at the core to figure out the exact cause.
I've done a 120-job grinder, and not able to reproduce.
I haven't found anything wrong with the jit code yet. All gc maps mark t1
and t0
live (this might be a problem if a slot is not initialized, but in this case, at the time of crash, the two slots have been initialized).
There's only one write to t1
, the object is from a field, which is good.
1153 aload0getfield
1154 getfield 51 net/adoptopenjdk/test/nio/NioBuffersTest$Buffers$BufferThread.this$1 Lnet/adoptopenjdk/test/nio/NioBuffersTest$Buffers;
1157 getfield 80 net/adoptopenjdk/test/nio/NioBuffersTest$Buffers.numberOfBytes [I <--the object in the field is good
1160 dup
1161 astore2 <---astore2 is bad object
1162 monitorenter
The slot is stored in only one place, and there is only one instruction stores register to this slot, which is after monitor enter. If anything's wrong, I wonder if the register's value is corrupted during moniter enter.
@dmitripivkine What was GC doing when the program crashed? Was it updating object references in stack frames? Is t0
also a bad slot? What kind of structure is the monitor table? Where is it? When is it written and by who?
What was GC doing when the program crashed?
This is crash in Scavenger, so GC is scanning thread stack and copy discovered objects one by one. And problematic address pointed to zeros (mid large array) so it crashed an attempt to read object's class.
Was it updating object references in stack frames?
Yes, as soon as copy operation completes (see t2 - t6 pointed to Survivor already). The problematic slot has not been updated by GC, Scavenger crashed at first attempt to get object info
Is t0 also a bad slot?
No. This object (use to be at 0xFFEA7F78
) has been forwarded already (to 0xffc1c0c8
) - this address has been discovered in t2. Slot 0xFFEA7F78
contains Forwarded Pointer to new object location (new address + 0x4 Forwarded bit). The reference in t0 is about to be fixed up to new object location
0xFFEA7F70 : 0000000800000008 00000000ffc1c0cc [ ................ ]
0xFFEA7F80 : 00047aa200033336 00019cf0000028d4 [ 63...z...(...... ]
!j9object 0x00000000ffc1c0c8
!J9IndexableObject 0x00000000FFC1C0C8 {
struct J9Class* clazz = !j9arrayclass 0x1218E00 // [I
Object flags = 0x00000022;
U_32 size = 0x0000000A;
[0] = 209718, 0x00033336, 0.00000000F
[1] = 293538, 0x00047aa2, 0.00000000F
[2] = 10452, 0x000028d4, 0.00000000F
[3] = 105712, 0x00019cf0, 0.00000000F
[4] = 67159, 0x00010657, 0.00000000F
[5] = 724002, 0x000b0c22, 0.00000000F
[6] = 951309, 0x000e840d, 0.00000000F
[7] = 250692, 0x0003d344, 0.00000000F
[8] = 766940, 0x000bb3dc, 0.00000000F
[9] = 120321, 0x0001d601, 0.00000000F
}
<1740b00> O-Slot: : t2[0x0000000001740A00] = 0x00000000FFC1C0C8
What kind of structure is the monitor table? Where is it? When is it written and by who?
Monitor Table is the structure maintained by VM (I am not sure does JIT add elements to it too). It is located (J9JavaVM* vm->monitorTableList
). It has form of Memory Pool and contains Monitor Object references obviously. GC keeps this structure up to date by removing dead objects (Clearable root).
@gacholio Could you shed some light here?
rdi
holds the object. There are more instructions for generated for monitorenter, some dereference rd1
to take its j9class, so rdi
was good before the store instruction.
0x7ff8be67d49f {net/.../NioBuffersTest$Buffers$BufferThread.run} +2911 0f85ad740000 jne 0x7ff8be684952 C>> +32786
0x7ff8be67d4a5 {net/.../NioBuffersTest$Buffers$BufferThread.run} +2917 f00fb12c0f lock cmpxchg dword ptr [rdi + rcx], ebp
0x7ff8be67d4aa {net/.../NioBuffersTest$Buffers$BufferThread.run} +2922 0f85a2740000 jne 0x7ff8be684952 C>> +32786
0x7ff8be67d4b0 {net/.../NioBuffersTest$Buffers$BufferThread.run} +2928 4889bc24d0000000 mov qword ptr [rsp + 0xd0], rdi <<< ^+32826 <--liqun: bad o-slot, it's also the monitor object
+32786 will call jitMonitorEntry, which eventually calls VM helper.
0x7ff8be684952 {net/.../NioBuffersTest$Buffers$BufferThread.run} +32786 4c8bea mov r13, rdx <<< +2873 +2889 +2911 +2922
0x7ff8be684955 {net/.../NioBuffersTest$Buffers$BufferThread.run} +32789 4c8be1 mov r12, rcx
0x7ff8be684958 {net/.../NioBuffersTest$Buffers$BufferThread.run} +32792 488bf0 mov rsi, rax
0x7ff8be68495b {net/.../NioBuffersTest$Buffers$BufferThread.run} +32795 488bd7 mov rdx, rdi
0x7ff8be68495e {net/.../NioBuffersTest$Buffers$BufferThread.run} +32798 488bcd mov rcx, rbp
0x7ff8be684961 GCMap Bytecode -1:1162 ByteCode Info Only!
0x7ff8be684961 {net/.../NioBuffersTest$Buffers$BufferThread.run} +32801 -1:1162 e8fa00ae13 call 0x7ff8d2164a60 <--liqun: call to jitMonitorEntry
0x7ff8be684966 {net/.../NioBuffersTest$Buffers$BufferThread.run} +32806 498bd5 mov rdx, r13
0x7ff8be684969 {net/.../NioBuffersTest$Buffers$BufferThread.run} +32809 498bcc mov rcx, r12
0x7ff8be68496c {net/.../NioBuffersTest$Buffers$BufferThread.run} +32812 488bc6 mov rax, rsi
gc map for this call is merged to the following gc map
0x7ff8be684a0c GCMap Bytecode -1:831 Registers: rdi StackMap: P0x100 0x70 0x78 0x80 0x88 0x90 0x98 0xa0 0xa8 0xb0 0xb8 0xc0 0xc8 0xd0 0xd8 0xe0 0xe8 0xf0
rdi
is live and should be updated if gc occurs during the helper call.
I suspect the register is corrupted in the helper call, and we wrote the corrupted value to the jit slot. The corrupted value also shows up in monitor table, which won't be modified by the JIT
I downloaded the sdk and core dump in /team/triage/liqunl/openj9-10984
I guess my first comment would be that being able to dereference rdi
does not imply that the value was good, merely that it was readable memory. If this is repeatable, try running with -Xcheck:gc:all:all
which I expect will show that the stacked value is bad before the GC.
There's no reason to think that the helper code is corrupting the value - they all use the same boilerplate code to save/update/restore the registers.
If this only occurs on windows, it's possibly due to the differing C linkage (though the helper linkage does not change). On windows, RDI is preserved in the C linkage, so it will not be saved or restored in the fast path of jitMonitorEnter (no GC possible), only in the slow path. It's extremely likely that we hit the slow path given that the fast path is the same as the JIT inline path.
Is this issue repeatable?
Is this issue repeatable?
Tried 120 jobs, no reproduce.
Full helper disassembly:
jitMonitorEntry proc
pop qword ptr 240[rbp]
call fast_jitMonitorEntry
test rax,rax
jne short L_slow_jitMonitorEntry
push qword ptr 240[rbp]
ret
L_slow_jitMonitorEntry:
mov qword ptr 32[rbp],rsp
mov rsp,qword ptr 2424[rbp]
mov qword ptr 56[rsp],rbx
mov qword ptr 80[rsp],rdi
mov qword ptr 88[rsp],rsi
mov qword ptr 144[rsp],r12
mov qword ptr 152[rsp],r13
mov qword ptr 160[rsp],r14
mov qword ptr 168[rsp],r15
movq qword ptr 176+(6*8)[rsp],xmm6
movq qword ptr 176+(7*8)[rsp],xmm7
mov rcx,rbp
sub rsp,4*8
mov qword ptr (32+0+(8*4))[rsp],rbp
mov rbp,qword ptr (40+0+(8*4))[rsp]
call rax
mov rbp,qword ptr (32+0+(8*4))[rsp]
add rsp,4*8
test rax,rax
jne short L_exception_jitMonitorEntry
mov rbx,qword ptr 56[rsp]
mov rdi,qword ptr 80[rsp]
mov rsi,qword ptr 88[rsp]
mov r12,qword ptr 144[rsp]
mov r13,qword ptr 152[rsp]
mov r14,qword ptr 160[rsp]
mov r15,qword ptr 168[rsp]
mov qword ptr 2424[rbp],rsp
mov rsp,qword ptr 32[rbp]
push qword ptr 240[rbp]
ret
L_exception_jitMonitorEntry:
jmp rax
jitMonitorEntry endp
The RDI value is used in the slow path (GC possible) case, so it's likely good on entry. I can still see no reason it would be corrupted if it is register mapped correctly.
The gc map at the monitor entry call is merged to gc map with larger instruction offset. It's something we have been doing for years. Not sure if anything's wrong there, will have to check. Otherwise, just need to reproduce the failure first.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_x86-64_linux_Nightly_testList_1/17 SC_Softmx_JitAot_Linux_0
Diagnostics, there is a core file: https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk8_j9_extended.system_x86-64_linux_Nightly_testList_1/17/system_test_output.tar.gz
16.jvm4.stderr
JVMJITM018W Async Compilation feature mismatch. Ignoring AOT code in shared class cache.
05:26:21.388 0x51ab00 j9vm.225 * ** ASSERTION FAILED ** at ObjectMonitor.cpp:347: (!(8 == (lock & (8 | 1))))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2020/12/22 00:26:21 - please wait.
Unhandled exception
Type=Segmentation error vmState=0x0002000f
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FC94E559DE0 Handler2=00007FC94DE3A020 InaccessibleAddress=000000003E84A586
RDI=00000000005459E8 RSI=0000000000000000 RAX=00000000005459E8 RBX=00007FC9073D7690
RCX=000000003E84A586 RDX=0000000000000000 R8=00000000005459E8 R9=00007FC9073D7690
R10=00007FC941CD0450 R11=00000000FFB33950 R12=0000000000000000 R13=000000000012A700
R14=0000000004600002 R15=0000000000000000
RIP=00007FC94E58BF38 GS=0000 FS=0000 RSP=00007FC9073D7610
EFlags=0000000000010216 CS=0033 RBP=0000000000000000 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=000000003E84A586
xmm0 00007fc9421640e8 (f: 1108754688.000000, d: 6.941740e-310)
xmm1 000000003e3ab283 (f: 1044034176.000000, d: 5.158214e-315)
xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3 00000000beaf87d8 (f: 3199174656.000000, d: 1.580602e-314)
xmm4 000000003e2dbeec (f: 1043185408.000000, d: 5.154021e-315)
xmm5 0000000039347648 (f: 959739456.000000, d: 4.741743e-315)
xmm6 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 00000000000004a7 (f: 1191.000000, d: 5.884322e-321)
xmm9 00000000000004a6 (f: 1190.000000, d: 5.879381e-321)
xmm10 0000000040400000 (f: 1077936128.000000, d: 5.325712e-315)
xmm11 0000000049d70a38 (f: 1238829568.000000, d: 6.120632e-315)
xmm12 000000004689a022 (f: 1183424512.000000, d: 5.846894e-315)
xmm13 0000000047ac082f (f: 1202456576.000000, d: 5.940925e-315)
xmm14 0000000048650dc0 (f: 1214582272.000000, d: 6.000833e-315)
xmm15 0000000046b73e38 (f: 1186414080.000000, d: 5.861665e-315)
Module=/home/jenkins/workspace/Test_openjdk8_j9_extended.system_x86-64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/jre/lib/amd64/compressedrefs/libj9vm29.so
Module_base_address=00007FC94E4C0000
Target=2_90_20201221_604 (Linux 2.6.32-754.29.1.el6.x86_64)
CPU=amd64 (4 logical CPUs) (0x1f4100000 RAM)
----------- Stack Backtrace -----------
(0x00007FC94E58BF38 [libj9vm29.so+0xcbf38])
(0x00007FC941CD0516 [libj9gc29.so+0x3e516])
(0x00007FC941CC46ED [libj9gc29.so+0x326ed])
(0x00007FC941CC33FF [libj9gc29.so+0x313ff])
(0x00007FC941CC5F72 [libj9gc29.so+0x33f72])
(0x00007FC941E4448D [libj9gc29.so+0x1b248d])
(0x00007FC941DBA048 [libj9gc29.so+0x128048])
(0x00007FC941DB982A [libj9gc29.so+0x12782a])
(0x00007FC94DE3AD83 [libj9prt29.so+0x25d83])
(0x00007FC941DB925F [libj9gc29.so+0x12725f])
(0x00007FC94E2B3326 [libj9thr29.so+0xe326])
(0x00007FC9502ECAA1 [libpthread.so.0+0x7aa1])
clone+0x6d (0x00007FC94FC1DC4D [libc.so.6+0xe8c4d])
---------------------------------------
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_x86-64_linux_Nightly_testList_0/56 SC_Softmx_JitAot_Linux_1 16.jvm4.stderr
JVMJITM018W Async Compilation feature mismatch. Ignoring AOT code in shared class cache.
04:42:30.797 0x550000 j9vm.225 * ** ASSERTION FAILED ** at ObjectMonitor.cpp:347: (!(8 == (lock & (8 | 1))))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/01/30 23:42:30 - please wait.
*** Invalid JIT return address 00007F77B6FCBA20 in 00007F77F45C96F0
J0000000000524000: Object neither in heap nor stack-allocated in thread Thread-18
0000000000524B00: Object neither in heap nor stack-allocated in thread Thread-19
... and then crash output
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.system_x86-64_windows_Nightly_testList_1/36 ClassLoadingTest_5m_1 JVM_OPTIONS: -XX:-UseCompressedOops
CLT 01:21:14.184 - Completed 20.0%. Number of tests started=13070 (+5746)
CLT stderr Unhandled exception
CLT stderr Type=Segmentation error vmState=0x0002000f
CLT stderr Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FF8F532A305 ContextFlags=0010005f
CLT stderr Handler1=00007FF8F687F280 Handler2=00007FF8F672AA20 InaccessibleWriteAddress=00000001FDB5D189
CLT stderr RDI=000000B914C83410 RSI=00000000FDB5D188 RAX=00000000981103E7 RBX=000000B923CCC7A8
CLT stderr RCX=0000000000000000 RDX=00000000981103E7 R8=000000B914916D80 R9=0000000000000000
CLT stderr R10=00000000FDB00008 R11=000000B91405D820 R12=FFFFFFFFFFFFFFFF R13=00000000FDB5D198
CLT stderr R14=0000000100000001 R15=000000B924D3F2E0
CLT stderr RIP=00007FF8F532A305 RSP=000000B924D3EF50 RBP=000000B924D3F050 EFLAGS=0000000000010246
CLT stderr FS=0053 ES=002B DS=002B
CLT stderr XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM1 803b7328fc2844d8 (f: 4230497536.000000, d: -1.526948e-307)
CLT stderr XMM2 000000003effc2a1 (f: 1056948928.000000, d: 5.222021e-315)
CLT stderr XMM3 000000003b1351ab (f: 991121856.000000, d: 4.896792e-315)
CLT stderr XMM4 000000004c123480 (f: 1276261504.000000, d: 6.305570e-315)
CLT stderr XMM5 c2fe000041c80000 (f: 1103626240.000000, d: -5.277657e+14)
CLT stderr XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
CLT stderr Module=F:\Users\jenkins\workspace\Test_openjdk11_j9_sanity.system_x86-64_windows_Nightly_testList_1\openjdkbinary\j2sdk-image\bin\default\j9gc29.dll
CLT stderr Module_base_address=00007FF8F5210000 Offset_in_DLL=000000000011a305
CLT stderr Target=2_90_20210627_36 (Windows Server 2012 R2 6.3 build 9600)
CLT stderr CPU=amd64 (8 logical CPUs) (0x1ffb9c000 RAM)
CLT stderr ----------- Stack Backtrace -----------
CLT stderr J9VMDllMain+0x1192f5 (0x00007FF8F532A305 [j9gc29+0x11a305])
CLT stderr J9VMDllMain+0x113545 (0x00007FF8F5324555 [j9gc29+0x114555])
CLT stderr J9VMDllMain+0x11a4da (0x00007FF8F532B4EA [j9gc29+0x11b4ea])
CLT stderr J9VMDllMain+0x17caa (0x00007FF8F5228CBA [j9gc29+0x18cba])
CLT stderr Java_java_lang_invoke_MutableCallSite_invalidate+0x50c639 (0x00007FF8F5B9CFD9 [j9jit29+0x5ccfd9])
CLT stderr Java_java_lang_invoke_MutableCallSite_invalidate+0x50afe3 (0x00007FF8F5B9B983 [j9jit29+0x5cb983])
CLT stderr J9_GetInterface+0x35c3f (0x00007FF8F68C2B2F [j9vm29+0x112b2f])
CLT stderr J9VMDllMain+0x17c65 (0x00007FF8F5228C75 [j9gc29+0x18c75])
CLT stderr J9VMDllMain+0x1494e (0x00007FF8F522595E [j9gc29+0x1595e])
CLT stderr J9VMDllMain+0x14c53 (0x00007FF8F5225C63 [j9gc29+0x15c63])
CLT stderr J9VMDllMain+0x13add (0x00007FF8F5224AED [j9gc29+0x14aed])
CLT stderr J9VMDllMain+0x1126b6 (0x00007FF8F53236C6 [j9gc29+0x1136c6])
CLT stderr J9VMDllMain+0x137517 (0x00007FF8F5348527 [j9gc29+0x138527])
CLT stderr J9VMDllMain+0x1385a2 (0x00007FF8F53495B2 [j9gc29+0x1395b2])
CLT stderr j9port_isCompatible+0x1a0ef (0x00007FF8F672CC1F [j9prt29+0x1cc1f])
CLT stderr J9VMDllMain+0x1386fb (0x00007FF8F534970B [j9gc29+0x13970b])
CLT stderr omrthread_get_category+0xa42 (0x00007FF8F6794492 [j9thr29+0x4492])
CLT stderr _o_strcat_s+0x5e (0x00007FF8F6B2C1AE [ucrtbase+0x1c1ae])
CLT stderr BaseThreadInitThunk+0x22 (0x00007FF908C813F2 [KERNEL32+0x13f2])
CLT stderr RtlUserThreadStart+0x34 (0x00007FF9092454F4 [ntdll+0x154f4])
CLT stderr ---------------------------------------
Looks like there is an attempt to copy object from 0xFDB00000
to 0xFDB5D188
but somehow copy address has high bit set (as well as low) InaccessibleWriteAddress=00000001FDB5D189
, so 0x100000001
is added to copy address
The reason for crash is bad O-slot O-Slot: : t12[0x000000000053A590] = 0x00000000FDB00000
where bad (stall?) pointer has been discovered:
<510900> JIT frame: bp = 0x000000000053A5F8, pc = 0x00007FF8E1F34679, unwindSP = 0x000000000053A4A0, cp = 0x00000000004E00D0, arg0EA = 0x000000000053A610, jitInfo = 0x000000B92444A468
<510900> Method: net/adoptopenjdk/test/classloading/ClassMapHog.addClass(Ljava/lang/String;Ljava/lang/Class;)Ljava/util/Map; !j9method 0x00000000004E0C40
<510900> Bytecode index = 0, inlineDepth = 0, PC offset = 0x0000000000007011
<510900> stackMap=0x000000B92444B8F6, slots=I16(0x0003) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0003), localBaseOffset=I16(0xFF20)
<510900> Described JIT args starting at 0x000000000053A600 for U16(0x0003) slots
<510900> O-Slot: : a2[0x000000000053A600] = 0x000000008091B5B8
<510900> O-Slot: : a1[0x000000000053A608] = 0x00000000FDB5D128
<510900> O-Slot: : a0[0x000000000053A610] = 0x00000000FDB5D138
<510900> Described JIT temps starting at 0x000000000053A518 for IDATA(0x000000000000001C) slots
<510900> O-Slot: : t27[0x000000000053A518] = 0x00000000FDB5D158
<510900> O-Slot: : t26[0x000000000053A520] = 0x00000000FDB5D158
<510900> O-Slot: : t25[0x000000000053A528] = 0x00000000FDB5D158
<510900> O-Slot: : t24[0x000000000053A530] = 0x00000000FDB5D158
<510900> O-Slot: : t23[0x000000000053A538] = 0x00000000FDB5D158
<510900> O-Slot: : t22[0x000000000053A540] = 0x00000000FDB5D158
<510900> O-Slot: : t21[0x000000000053A548] = 0x00000000FDB5D158
<510900> I-Slot: : t20[0x000000000053A550] = 0x000000000053A558
<510900> SA-Obj[53a558]
<510900> F-Slot[0x000000000053A560] = 0x00000000FDB5D128
<510900> F-Slot[0x000000000053A564] = 0x000000008011EBF8
<510900> F-Slot[0x000000000053A568] = 0x0000000000000000
<510900> I-Slot: : t18[0x000000000053A560] = 0x8011EBF8FDB5D128
<510900> I-Slot: : t17[0x000000000053A568] = 0x0000000000000000
<510900> I-Slot: : t16[0x000000000053A570] = 0x0000001AFFFFFFFF
<510900> I-Slot: : t15[0x000000000053A578] = 0x0000000000000000
<510900> I-Slot: : t14[0x000000000053A580] = 0x000000000000002E
<510900> I-Slot: : t13[0x000000000053A588] = 0x00000000FC291C40
<510900> O-Slot: : t12[0x000000000053A590] = 0x00000000FDB00000 <--------------
<510900> O-Slot: : t11[0x000000000053A598] = 0x00000000FDB04DD8
<510900> O-Slot: : t10[0x000000000053A5A0] = 0x00000000FC2844D8
<510900> O-Slot: : t9[0x000000000053A5A8] = 0x00000000FDB04D80
<510900> I-Slot: : t8[0x000000000053A5B0] = 0x0000000000000001
<510900> I-Slot: : t7[0x000000000053A5B8] = 0x000000000005C400
<510900> I-Slot: : t6[0x000000000053A5C0] = 0x00000000FDB04C70
<510900> I-Slot: : t5[0x000000000053A5C8] = 0x0000000080B16A08
<510900> I-Slot: : t4[0x000000000053A5D0] = 0x0000000000000202
<510900> I-Slot: : t3[0x000000000053A5D8] = 0x0000000000000206
<510900> I-Slot: : t2[0x000000000053A5E0] = 0x000000000005C400
<510900> I-Slot: : t1[0x000000000053A5E8] = 0x000000000001BDCF
<510900> I-Slot: : t0[0x000000000053A5F0] = 0x0000000000000206
<510900> JIT-RegisterMap = UDATA(0x0000000000001000)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F7E0] = U64(0x0000000080110000) (jit_rax)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F7E8] = U64(0x00000000FDB04DE8) (jit_rbx)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F7F0] = U64(0x000000000005C400) (jit_rcx)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F7F8] = U64(0x00000000FDB00000) (jit_rdx)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F800] = U64(0x00000000FD94A6E8) (jit_rdi)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F808] = U64(0x00000000FD94A6BC) (jit_rsi)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F810] = U64(0x0000000000000000) (jit_rbp)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F818] = U64(0x0000000000000011) (jit_rsp)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F820] = U64(0x00000000FD94A6C0) (jit_r8)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F828] = U64(0x00000000FD94A6A8) (jit_r9)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F830] = U64(0x00000000FD94A698) (jit_r10)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F838] = U64(0x00000000FDB04DD8) (jit_r11)
<510900> JIT-RegisterMap-O-Slot[0x000000B925F7F840] = 0x00000000FD94A6C0 (jit_r12)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F848] = U64(0x000000000001BDCF) (jit_r13)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F850] = U64(0x000000000005C400) (jit_r14)
<510900> JIT-RegisterMap-I-Slot[0x000000B925F7F858] = U64(0x00000000FDB04DD8) (jit_r15)
<510900> JIT-Frame-RegisterMap[0x000000000053A4E0] = UDATA(0x000000000005C400) (jit_rbx)
<510900> JIT-Frame-RegisterMap[0x000000000053A4E8] = UDATA(0x00000000FDB04C70) (jit_r9)
<510900> JIT-Frame-RegisterMap[0x000000B925F7F830] = U64(0x00000000FD94A698) (jit_r10)
<510900> JIT-Frame-RegisterMap[0x000000B925F7F838] = U64(0x00000000FDB04DD8) (jit_r11)
<510900> JIT-Frame-RegisterMap[0x000000B925F7F840] = U64(0x00000000FD94A6C0) (jit_r12)
<510900> JIT-Frame-RegisterMap[0x000000B925F7F848] = U64(0x000000000001BDCF) (jit_r13)
<510900> JIT-Frame-RegisterMap[0x000000B925F7F850] = U64(0x000000000005C400) (jit_r14)
<510900> JIT-Frame-RegisterMap[0x000000B925F7F858] = U64(0x00000000FDB04DD8) (jit_r15)
0xFDB00000
points mid-object !fj9object 0xfdaffff8
. Surprisingly an attempt to copy garbage did not crash immediately but proceeded a little bit further. It is interesting I looked to very similar case today https://github.com/eclipse-openj9/openj9/issues/13068#issuecomment-869949533
There is common for both cases the bogus address is very aligned (may be coincidence)
Ah, it is even the same problematic method net/adoptopenjdk/test/classloading/ClassMapHog.addClass
for both cases
Hm, indeed. I wonder if it's a non-JITServer specific bug then. Since this test is for AOT, maybe it's an AOT bug.
New occurrence in https://github.com/eclipse-openj9/openj9/issues/14119
The problem in https://github.com/eclipse-openj9/openj9/issues/14119:
Object pointer 0xffcc5b58
is not valid, points to zeros in the heap. It is discovered in 0-slots in two threads:
Checking THREAD STACKS...
<gc check (1): from debugger: THREAD STACKS: slot 580800(580628) -> ffcc5b58: class pointer is null>
<gc check (2): from debugger: THREAD STACKS: slot 59eb00(59e928) -> ffcc5b58: class pointer is null>
<580800> JIT frame: bp = 0x0000000000580638, pc = 0x00007FFAE7CD78AD, unwindSP = 0x0000000000580540, cp = 0x0000000000544AA0, arg0EA = 0x0000000000580640, jitInfo = 0x000000B5ACC146E8
<580800> Method: net/adoptopenjdk/test/nio/NioBuffersTest$Buffers$BufferThread.run()V !j9method 0x00000000005458E0
<580800> Bytecode index = 2,155, inlineDepth = 0, PC offset = 0x000000000000528D
<580800> stackMap=0x000000B5ACC17D20, slots=I16(0x0001) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0001), localBaseOffset=I16(0xFF78)
<580800> Described JIT args starting at 0x0000000000580640 for U16(0x0001) slots
<580800> O-Slot: : a0[0x0000000000580640] = 0x00000000FFEE0758
<580800> Described JIT temps starting at 0x00000000005805B0 for IDATA(0x0000000000000011) slots
<580800> O-Slot: : t16[0x00000000005805B0] = 0x0000000000000000
<580800> O-Slot: : t15[0x00000000005805B8] = 0x0000000000000000
<580800> O-Slot: : t14[0x00000000005805C0] = 0x0000000000000000
<580800> O-Slot: : t13[0x00000000005805C8] = 0x0000000000000000
<580800> O-Slot: : t12[0x00000000005805D0] = 0x0000000000000000
<580800> O-Slot: : t11[0x00000000005805D8] = 0x0000000000000000
<580800> O-Slot: : t10[0x00000000005805E0] = 0x0000000000000000
<580800> O-Slot: : t9[0x00000000005805E8] = 0x0000000000000000
<580800> O-Slot: : t8[0x00000000005805F0] = 0x0000000000000000
<580800> O-Slot: : t7[0x00000000005805F8] = 0x0000000000000000
<580800> O-Slot: : t6[0x0000000000580600] = 0x00000000FFF03170
<580800> O-Slot: : t5[0x0000000000580608] = 0x00000000FFEE0758
<580800> O-Slot: : t4[0x0000000000580610] = 0x00000000FFF031B0
<580800> O-Slot: : t3[0x0000000000580618] = 0x00000000FFF03170
<580800> O-Slot: : t2[0x0000000000580620] = 0x00000000FFF031F0
<580800> O-Slot: : t1[0x0000000000580628] = 0x00000000FFCC5B58 <-----------------------
<580800> O-Slot: : t0[0x0000000000580630] = 0x00000000FFA16DA0
<580800> JIT-RegisterMap = UDATA(0x0000000000000000)
<580800> JIT-Frame-RegisterMap[0x0000000000580560] = UDATA(0x0000000000545700) (jit_rbx)
<580800> JIT-Frame-RegisterMap[0x0000000000580568] = UDATA(0x00007FFAFC63E3DD) (jit_r9)
<580800> JIT-Frame-RegisterMap[0x000000B5ADA8F480] = U64(0x00000000000FAEC0) (jit_r10)
<580800> JIT-Frame-RegisterMap[0x000000B5ADA8F488] = U64(0x0000000000000000) (jit_r11)
<580800> JIT-Frame-RegisterMap[0x000000B5ADA8F490] = U64(0x00000000000FAED0) (jit_r12)
<580800> JIT-Frame-RegisterMap[0x000000B5ADA8F498] = U64(0x00000000FFA57FD8) (jit_r13)
<580800> JIT-Frame-RegisterMap[0x000000B5ADA8F4A0] = U64(0x0000000000000002) (jit_r14)
<580800> JIT-Frame-RegisterMap[0x000000B5ADA8F4A8] = U64(0x0000000080380840) (jit_r15)
<59eb00> JIT frame: bp = 0x000000000059E938, pc = 0x00007FFAE7CD78AD, unwindSP = 0x000000000059E840, cp = 0x0000000000544AA0, arg0EA = 0x000000000059E940, jitInfo = 0x000000B5ACC146E8
<59eb00> Method: net/adoptopenjdk/test/nio/NioBuffersTest$Buffers$BufferThread.run()V !j9method 0x00000000005458E0
<59eb00> Bytecode index = 2,155, inlineDepth = 0, PC offset = 0x000000000000528D
<59eb00> stackMap=0x000000B5ACC17D20, slots=I16(0x0001) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0001), localBaseOffset=I16(0xFF78)
<59eb00> Described JIT args starting at 0x000000000059E940 for U16(0x0001) slots
<59eb00> O-Slot: : a0[0x000000000059E940] = 0x00000000FFEF1DF8
<59eb00> Described JIT temps starting at 0x000000000059E8B0 for IDATA(0x0000000000000011) slots
<59eb00> O-Slot: : t16[0x000000000059E8B0] = 0x0000000000000000
<59eb00> O-Slot: : t15[0x000000000059E8B8] = 0x0000000000000000
<59eb00> O-Slot: : t14[0x000000000059E8C0] = 0x0000000000000000
<59eb00> O-Slot: : t13[0x000000000059E8C8] = 0x0000000000000000
<59eb00> O-Slot: : t12[0x000000000059E8D0] = 0x0000000000000000
<59eb00> O-Slot: : t11[0x000000000059E8D8] = 0x0000000000000000
<59eb00> O-Slot: : t10[0x000000000059E8E0] = 0x0000000000000000
<59eb00> O-Slot: : t9[0x000000000059E8E8] = 0x0000000000000000
<59eb00> O-Slot: : t8[0x000000000059E8F0] = 0x0000000000000000
<59eb00> O-Slot: : t7[0x000000000059E8F8] = 0x0000000000000000
<59eb00> O-Slot: : t6[0x000000000059E900] = 0x00000000FFF03170
<59eb00> O-Slot: : t5[0x000000000059E908] = 0x00000000FFEF1DF8
<59eb00> O-Slot: : t4[0x000000000059E910] = 0x00000000FFF031B0
<59eb00> O-Slot: : t3[0x000000000059E918] = 0x00000000FFF03170
<59eb00> O-Slot: : t2[0x000000000059E920] = 0x00000000FFF031F0
<59eb00> O-Slot: : t1[0x000000000059E928] = 0x00000000FFCC5B58 <-----------------------
<59eb00> O-Slot: : t0[0x000000000059E930] = 0x00000000FFA16DA0
<59eb00> JIT-RegisterMap = UDATA(0x0000000000000000)
<59eb00> JIT-Frame-RegisterMap[0x000000000059E860] = UDATA(0x0000000000545700) (jit_rbx)
<59eb00> JIT-Frame-RegisterMap[0x000000000059E868] = UDATA(0x00007FFAFC63E3DD) (jit_r9)
<59eb00> JIT-Frame-RegisterMap[0x000000B5AE28F3C0] = U64(0x00000000000BCA48) (jit_r10)
<59eb00> JIT-Frame-RegisterMap[0x000000B5AE28F3C8] = U64(0x0000000000000000) (jit_r11)
<59eb00> JIT-Frame-RegisterMap[0x000000B5AE28F3D0] = U64(0x00000000000BCA58) (jit_r12)
<59eb00> JIT-Frame-RegisterMap[0x000000B5AE28F3D8] = U64(0x00000000FFA70240) (jit_r13)
<59eb00> JIT-Frame-RegisterMap[0x000000B5AE28F3E0] = U64(0x0000000000000002) (jit_r14)
<59eb00> JIT-Frame-RegisterMap[0x000000B5AE28F3E8] = U64(0x0000000080380840) (jit_r15)
Failure link
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_x86-64_windows_Nightly/526/
To rebuild the failed tests in=https://ci.eclipse.org/openj9/job/Grinder, use the following links:
23:30:37 https://ci.eclipse.org/openj9/job/Grinder/parambuild/?JDK_VERSION=8&JDK_IMPL=openj9&BUILD_LIST=system&PLATFORM=x86-64_windows&TARGET=SC_Softmx_JitAot_0
Optional info
Failure output (captured from console output)