eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

JDK11 JITServer system_custom_0 Segmentation Fault #13068

Open dmitry-ten opened 3 years ago

dmitry-ten commented 3 years ago

Link to the grinder: https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/16570/consoleText The test fails on x86-64 JDK11, other platforms have not been grinded to determine whether the crash is present there. Failure rate is 3/358. Output from the crashed test:

[2021-06-24T20:22:15.845Z] CLT 13:22:15.045 - Starting thread. Suite=0 thread=9
[2021-06-24T20:22:34.744Z] CLT stderr Unhandled exception
[2021-06-24T20:22:34.744Z] CLT stderr Type=Segmentation error vmState=0x0002000f
[2021-06-24T20:22:34.744Z] CLT stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
[2021-06-24T20:22:34.744Z] CLT stderr Handler1=00007F69AB81ACD0 Handler2=00007F69AB5799F0 InaccessibleAddress=000000000000001A
[2021-06-24T20:22:34.744Z] CLT stderr RDI=00007F69AC0528F8 RSI=00007F6938001CC8 RAX=00007F69AC02A330 RBX=00007F69AC0897E0
[2021-06-24T20:22:34.744Z] CLT stderr RCX=00007F6954B5C238 RDX=00007F6954B5C3C0 R8=00007F6954B5C240 R9=00007F6954B5C248
[2021-06-24T20:22:34.744Z] CLT stderr R10=0000000000000006 R11=00007F69A6B902B0 R12=0000000000000000 R13=00007F6954B5C3C0
[2021-06-24T20:22:34.744Z] CLT stderr R14=00007F6954B5C238 R15=00007F69AC052860
[2021-06-24T20:22:34.744Z] CLT stderr RIP=00007F69A8A523D4 GS=0000 FS=0000 RSP=00007F6954B5C190
[2021-06-24T20:22:34.744Z] CLT stderr EFlags=0000000000010246 CS=0033 RBP=0000000000000001 ERR=0000000000000004
[2021-06-24T20:22:34.744Z] CLT stderr TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=000000000000001A
[2021-06-24T20:22:34.744Z] CLT stderr xmm0 00007f69a8d6f6d0 (f: 2832660224.000000, d: 6.921454e-310)
[2021-06-24T20:22:34.745Z] CLT stderr xmm1 c2887b00fdd5e338 (f: 4258652928.000000, d: -3.364572e+12)
[2021-06-24T20:22:34.745Z] CLT stderr xmm2 000000000000002d (f: 45.000000, d: 2.223295e-322)
[2021-06-24T20:22:34.745Z] CLT stderr xmm3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm4 3fc526e57720db08 (f: 1998641920.000000, d: 1.652495e-01)
[2021-06-24T20:22:34.745Z] CLT stderr xmm5 65766163532f6176 (f: 1395614080.000000, d: 5.804244e+180)
[2021-06-24T20:22:34.745Z] CLT stderr xmm6 63672f656d69746e (f: 1835627648.000000, d: 6.999988e+170)
[2021-06-24T20:22:34.745Z] CLT stderr xmm7 6f2f6c616e6f7372 (f: 1852797824.000000, d: 3.722026e+227)
[2021-06-24T20:22:34.745Z] CLT stderr xmm8 005f0046004f005f (f: 5177439.000000, d: 6.897967e-307)
[2021-06-24T20:22:34.745Z] CLT stderr xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2021-06-24T20:22:34.745Z] CLT stderr Module=/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/lib/default/libj9gc29.so
[2021-06-24T20:22:34.745Z] CLT stderr Module_base_address=00007F69A88AE000
[2021-06-24T20:22:34.745Z] CLT stderr Target=2_90_20210624_765 (Linux 4.15.0-144-generic)
[2021-06-24T20:22:34.745Z] CLT stderr CPU=amd64 (4 logical CPUs) (0xf6817000 RAM)
[2021-06-24T20:22:34.745Z] CLT stderr ----------- Stack Backtrace -----------
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A8A523D4 [libj9gc29.so+0x1a43d4])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A8A2D831 [libj9gc29.so+0x17f831])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A8A292B0 [libj9gc29.so+0x17b2b0])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A8A327AF [libj9gc29.so+0x1847af])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A96BDF73 [libj9jit29.so+0x949f73])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A96BE39C [libj9jit29.so+0x94a39c])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A96BF65C [libj9jit29.so+0x94b65c])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69AB8595DE [libj9vm29.so+0x7d5de])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A88EF646 [libj9gc29.so+0x41646])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A88E751D [libj9gc29.so+0x3951d])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A88E622F [libj9gc29.so+0x3822f])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A88E8DA2 [libj9gc29.so+0x3ada2])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A8A2B90B [libj9gc29.so+0x17d90b])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A89E05D7 [libj9gc29.so+0x1325d7])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A89DFDE9 [libj9gc29.so+0x131de9])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69AB57A753 [libj9prt29.so+0x2a753])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69A89DF8EF [libj9gc29.so+0x1318ef])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69AB3434F6 [libj9thr29.so+0xe4f6])
[2021-06-24T20:22:34.745Z] CLT stderr (0x00007F69B1E6E6DB [libpthread.so.0+0x76db])
[2021-06-24T20:22:34.745Z] CLT stderr clone+0x3f (0x00007F69B178271F [libc.so.6+0x12171f])
[2021-06-24T20:22:34.745Z] CLT stderr ---------------------------------------

In another failed iteration the output is different:

[2021-06-25T02:12:11.746Z] CLT 19:12:11.367 - Starting thread. Suite=0 thread=9
[2021-06-25T02:12:32.149Z] CLT stderr Corruption in Evacuate at 00000000FEAF0000: calculated object size 65696764202150032 larger then available 1703936, Forwarded Header at 00007FE7D828F950
[2021-06-25T02:12:32.149Z] CLT stderr 02:12:30.655 0x51ba00    j9mm.141    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK11_x86-64_linux_jit_Personal/omr/gc/base/standard/Scavenger.cpp:1614: ((false))
[2021-06-25T02:12:32.149Z] CLT stderr JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/06/24 19:12:30 - please wait.

In both cases the failure is inside GC code.

dmitripivkine commented 3 years ago

Please let me know if you need analysis for these crashes (I am going to need system core if it is a case)

dmitry-ten commented 3 years ago

@dmitripivkine thank you, your help would be appreciated. This is the stack trace of the segfault:

#12 <signal handler called>
#13 MM_ForwardedHeader::readClassSlot (
    destinationObjectPtr=0x740065006e0028, this=0x7f5ca0bcc3c0)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/structs/ForwardedHeader.hpp:224
#14 MM_ForwardedHeader::copyOrWaitOutline (
    this=this@entry=0x7f5ca0bcc3c0,
    destinationObjectPtr=destinationObjectPtr@entry=0x740065006e0028)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/structs/ForwardedHeader.cpp:250
#15 0x00007f5cf7cbd145 in MM_ForwardedHeader::copyOrWait (
    destinationObjectPtr=0x740065006e0028, this=0x7f5ca0bcc3c0)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/structs/ForwardedHeader.hpp:412
#16 MM_Scavenger::copyAndForward (objectPtrIndirect=0x522f80,
    env=0x7f5c7c001cc8, this=0x7f5cf8083200)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/base/standard/Scavenger.cpp:1405
---Type <return> to continue, or q <return> to quit---
#17 MM_Scavenger::copyAndForwardThreadSlot (this=0x7f5cf8083200,
    env=env@entry=0x7f5c7c001cc8,
    objectPtrIndirect=objectPtrIndirect@entry=0x522f80)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/base/standard/Scavenger.cpp:3175
#18 0x00007f5cf7cc662f in MM_ScavengerRootScanner::doStackSlot (
    this=0x7f5ca0bccad0, slotPtr=0x522f80, walkState=<optimized out>,
    stackLocation=0x522f80)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/gc_glue_java/ScavengerRootScanner.hpp:105
#19 0x00007f5cfcddc563 in walkJITFrameSlots (
    walkState=walkState@entry=0x7f5ca0bcc6d0,
    jitDescriptionBits=jitDescriptionBits@entry=0x7f5ca0bcc58e "\v",
    stackAllocMapBits=stackAllocMapBits@entry=0x7f5ca0bcc58f "",
    jitDescriptionCursor=jitDescriptionCursor@entry=0x7f5ca0bcc590,
    stackAllocMapCursor=stackAllocMapCursor@entry=0x7f5ca0bcc598,
    jitBitsRemaining=jitBitsRemaining@entry=0x7f5ca0bcc5a0,
    mapBytesRemaining=0x7f5ca0bcc5a8, scanCursor=0x522f80,
    slotsRemaining=15, stackMap=0x7f5ca137e59c,
    gcStackAtlas=0x7f5ca137e4ac, slotDescription=<optimized out>)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:646
#20 0x00007f5cfcddc98c in jitWalkFrame (
    walkState=walkState@entry=0x7f5ca0bcc6d0,
    walkLocals=walkLocals@entry=1, stackMap=0x7f5ca137e59c)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:577
#21 0x00007f5cfcdddc4c in jitWalkStackFrames (walkState=0x7f5ca0bcc6d0)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/codert_vm/jswalk.c:243
#22 0x00007f5cfef1453e in walkStackFrames (currentThread=0x1a4200,
    walkState=0x7f5ca0bcc6d0)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/vm/swalk.c:336
#23 0x00007f5cf7b833f6 in GC_VMThreadStackSlotIterator::scanSlots (
    vmThread=<optimized out>, walkThread=walkThread@entry=0x513b00,
    userData=userData@entry=0x7f5ca0bcc9e0,
    oSlotIterator=oSlotIterator@entry=0x7f5cf7b7b7c0 <stackSlotIterator(J9JavaVM*, J9Object**, void*, J9StackWalkState*, void const*)>,
    includeStackFrameClassReferences=<optimized out>,
---Type <return> to continue, or q <return> to quit---
    h=<optimized out>) at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/gc_structs/VMThreadStackSlotIterator.cpp:114
#24 0x00007f5cf7b7b2cd in MM_RootScanner::scanOneThread (this=0x7f5ca0bccad0, env=0x7f5c7c001cc8, walkThread=0x513b00, localData=0x7f5ca0bcc9e0)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/gc_base/RootScanner.cpp:519
#25 0x00007f5cf7b79fdf in MM_RootScanner::scanThreads (this=0x7f5ca0bccad0, env=0x7f5c7c001cc8)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/gc_base/RootScanner.cpp:488
#26 0x00007f5cf7b7cb52 in MM_RootScanner::scanRoots (this=0x7f5ca0bccad0, env=0x7f5c7c001cc8)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/gc_base/RootScanner.cpp:919
#27 0x00007f5cf7cbf78b in MM_ScavengerRootScanner::scanRoots (env=0x7f5c7c001cc8, this=0x7f5ca0bccad0)
    at /root/src-11/openj9-openjdk-jdk11/openj9/runtime/gc_glue_java/ScavengerRootScanner.hpp:200
#28 MM_Scavenger::workThreadGarbageCollect (this=0x7f5cf8083200, env=0x7f5c7c001cc8)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/base/standard/Scavenger.cpp:2572
#29 0x00007f5cf7c74457 in MM_ParallelDispatcher::workerEntryPoint (this=0x7f5cf8049030, env=0x7f5c7c001cc8)
    at /root/src-11/openj9-openjdk-jdk11/omr/gc/base/ParallelDispatcher.cpp:186

So far the only thing I can tell is that destinationObjectPtr contains invalid address and causes the crash. I'll send you the core dump and jdk I used in DMs.

dmitripivkine commented 3 years ago

The reason for crash is bad (stall?) O-Slot: : t14[0x0000000000522F80] = 0x00000007FFF80000

<513b00> JIT frame: bp = 0x0000000000522FF8, pc = 0x00007F5CD9F388CB, unwindSP = 0x0000000000522F10, cp = 0x00000000004E10D0, arg0EA = 0x0000000000523010, jitInfo = 0x00007F5CA137DB38
<513b00>    Method: net/adoptopenjdk/test/classloading/ClassMapHog.addClass(Ljava/lang/String;Ljava/lang/Class;)Ljava/util/Map; !j9method 0x00000000004E1C40
<513b00>    Bytecode index = 65, inlineDepth = 0, PC offset = 0x0000000000000AA3
<513b00>    stackMap=0x00007F5CA137E59C, slots=I16(0x0003) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0003), localBaseOffset=I16(0xFF88)
<513b00>    Described JIT args starting at 0x0000000000523000 for U16(0x0003) slots
<513b00>        O-Slot: : a2[0x0000000000523000] = 0x0000000705AF0A38
<513b00>        O-Slot: : a1[0x0000000000523008] = 0x00000007E8352798
<513b00>        O-Slot: : a0[0x0000000000523010] = 0x00000007E83527A8
<513b00>    Described JIT temps starting at 0x0000000000522F80 for IDATA(0x000000000000000F) slots
<513b00>        O-Slot: : t14[0x0000000000522F80] = 0x00000007FFF80000 <--------
<513b00>        O-Slot: : t13[0x0000000000522F88] = 0x00000007FFF89D00
<513b00>        I-Slot: : t12[0x0000000000522F90] = 0x00000007FFEC3080
<513b00>        O-Slot: : t11[0x0000000000522F98] = 0x00000007FAF16178
<513b00>        I-Slot: : t10[0x0000000000522FA0] = 0x0000000000000001
<513b00>        I-Slot: : t9[0x0000000000522FA8] = 0x00000007FFF896D0
<513b00>        I-Slot: : t8[0x0000000000522FB0] = 0x0000000000000001
<513b00>        I-Slot: : t7[0x0000000000522FB8] = 0x00000007FFF896D0
<513b00>        I-Slot: : t6[0x0000000000522FC0] = 0x000000000001A99C
<513b00>        I-Slot: : t5[0x0000000000522FC8] = 0x00000007FEC041F8
<513b00>        I-Slot: : t4[0x0000000000522FD0] = 0x00000007FFEC30A8
<513b00>        I-Slot: : t3[0x0000000000522FD8] = 0x00000007FFF89AD0
<513b00>        I-Slot: : t2[0x0000000000522FE0] = 0x00000007FEC06B78
<513b00>        I-Slot: : t1[0x0000000000522FE8] = 0x00000007058BDC40
<513b00>        I-Slot: : t0[0x0000000000522FF0] = 0x000000070598FAE8
<513b00>    JIT-RegisterMap = UDATA(0x0000000000000002)
<513b00>        JIT-RegisterMap-O-Slot[0x0000000000522EC8] = 0x00000007FFF89DC0 (jit_rbx)
<513b00>        JIT-RegisterMap-I-Slot[0x0000000000522ED0] = UDATA(0x00000007FFF89C00) (jit_r9)
<513b00>        JIT-RegisterMap-I-Slot[0x00007F5CA00E49F0] = UDATA(0x0000000000000000) (jit_r10)
<513b00>        JIT-RegisterMap-I-Slot[0x00007F5CA00E49F8] = UDATA(0x00000007FFF89C10) (jit_r11)
<513b00>        JIT-RegisterMap-I-Slot[0x00007F5CA00E4A00] = UDATA(0x00000007E8352760) (jit_r12)
<513b00>        JIT-RegisterMap-I-Slot[0x00007F5CA00E4A08] = UDATA(0x00000007FFF89E00) (jit_r13)
<513b00>        JIT-RegisterMap-I-Slot[0x00007F5CA00E4A10] = UDATA(0x00000007E8352788) (jit_r14)
<513b00>        JIT-RegisterMap-I-Slot[0x00007F5CA00E4A18] = UDATA(0x00000007FFF89DE8) (jit_r15)
<513b00>    JIT-Frame-RegisterMap[0x0000000000522F50] = UDATA(0x0000000000000001) (jit_rbx)
<513b00>    JIT-Frame-RegisterMap[0x0000000000522F58] = UDATA(0x00000007FFF896D0) (jit_r9)
<513b00>    JIT-Frame-RegisterMap[0x00007F5CA00E49F0] = UDATA(0x0000000000000000) (jit_r10)
<513b00>    JIT-Frame-RegisterMap[0x00007F5CA00E49F8] = UDATA(0x00000007FFF89C10) (jit_r11)
<513b00>    JIT-Frame-RegisterMap[0x00007F5CA00E4A00] = UDATA(0x00000007E8352760) (jit_r12)
<513b00>    JIT-Frame-RegisterMap[0x00007F5CA00E4A08] = UDATA(0x00000007FFF89E00) (jit_r13)
<513b00>    JIT-Frame-RegisterMap[0x00007F5CA00E4A10] = UDATA(0x00000007E8352788) (jit_r14)
<513b00>    JIT-Frame-RegisterMap[0x00007F5CA00E4A18] = UDATA(0x00000007FFF89DE8) (jit_r15)

This 0-slot points inside object !j9object 0x7FFF7FFB8

0x7FFF7FFB0 :  00000000 00000000 0004a200 0000008c [ ................ ] <--- object start
0x7FFF7FFC0 :  fff7ffc8 00000007 00750070 006c0062 [ ........p.u.b.l. ]
0x7FFF7FFD0 :  00630069 00730020 00610074 00690074 [ i.c. .s.t.a.t.i. ]
0x7FFF7FFE0 :  00200063 00690066 0061006e 0020006c [ c. .f.i.n.a.l. . ]
0x7FFF7FFF0 :  006e0069 00200074 0061006a 00610076 [ i.n.t. .j.a.v.a. ]
0x7FFF80000 :  006e002e 00740065 0048002e 00740074 [ ..n.e.t...H.t.t. ] <--- mid object pointer
0x7FFF80010 :  00550070 004c0052 006f0043 006e006e [ p.U.R.L.C.o.n.n. ]
0x7FFF80020 :  00630065 00690074 006e006f 0000002e [ e.c.t.i.o.n..... ]
0x7FFF80030 :  00000000 00000000 00000000 00000000 [ ................ ]
0x7FFF80040 :  00000000 00000000 00000000 00000000 [ ................ ]
dmitripivkine commented 3 years ago

BTW it is easy to find problematic slot using gccheck in jdmpview:

> !gccheck all,noobjectheap:all:midscavenge,quiet
Starting GC Check
Checking CLASS HEAP...done (1799 ms).
Checking REMEMBERED SET...done (57 ms).
Checking UNFINALIZED...done (7 ms).
Checking FINALIZABLE...done (3 ms).
Checking OWNABLE_SYNCHRONIZER...done (2 ms).
Checking STRING TABLE...done (1384 ms).
Checking CLASS LOADERS...done (5 ms).
Checking JNI GLOBAL REFS...done (18 ms).
Checking JNI WEAK GLOBAL REFS...done (1 ms).
Checking JVMTI OBJECT TAG TABLES...done (4 ms).
Checking VM CLASS SLOTS...done (0 ms).
Checking MONITOR TABLE...done (9 ms).
Checking VM THREAD SLOTS...done (386 ms).
Checking THREAD STACKS...  <gc check (1): from debugger: THREAD STACKS: slot 513b00(522f80) -> 7fff80000: not in an object segment>
done (384 ms).
Done (4113ms)
dmitripivkine commented 3 years ago

FYI this failure looks very similar https://github.com/eclipse-openj9/openj9/issues/10984#issuecomment-870044385

dmitripivkine commented 3 years ago

Ah, it is even the same problematic method net/adoptopenjdk/test/classloading/ClassMapHog.addClass for both cases