eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

OpenJDK java/math/BigInteger/BigIntegerTest - Misaligned stack-allocated object #17751

Closed pshipton closed 1 year ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_aarch64_linux_OMR_testList_1/4 jdk_math_1 java/math/BigInteger/BigIntegerTest.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk20_j9_sanity.openjdk_aarch64_linux_OMR_testList_1/4/openjdk_test_output.tar.gz

05:55:55  baz shifstderr:
05:55:55  0000FFFFAC6B2200: Misaligned stack-allocated object in thread AgentVMThread
05:55:55  0000FFFFAC6B2200: O-Slot=0000FFFF38EEE320
05:55:55  0000FFFFAC6B2200: O-Slot value=0000FFFF00058344
05:55:55  0000FFFFAC6B2200: PC=0000FFFF72FAC800
05:55:55  0000FFFFAC6B2200: framesWalked=1
05:55:55  0000FFFFAC6B2200: arg0EA=0000FFFF00058A28
05:55:55  0000FFFFAC6B2200: walkSP=0000FFFF00058008
05:55:55  0000FFFFAC6B2200: literals=0000000000000000
05:55:55  0000FFFFAC6B2200: jitInfo=0000FFFF7063EEB8
05:55:55  0000FFFFAC6B2200: method=0000FFFF08031C50 (java/math/BigInteger.smallToString(ILjava/lang/StringBuilder;I)V) (JIT)
05:55:55  0000FFFFAC6B2200: stack=0000FFFF00053758-0000FFFF00058FD0
05:55:55  09:55:53.080 0xffffac6b2200    j9mm.479    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK20_aarch64_linux_OMR/openj9/runtime/gc_glue_java/ScavengerRootScanner.hpp:109: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
pshipton commented 1 year ago

@knn-k fyi

knn-k commented 1 year ago

It is the AgentVMThread. The stack backtrace looks like this.

> !stack 0xffffac6b2200
<ffffac6b2200>                          known but unhandled frame type com.ibm.j9ddr.vm29.pointer.U8Pointer @ 0x00000005

 FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT  FAULT

<ffffac6b2200>  !j9method 0x0000FFFFAC22B890   java/lang/StringLatin1.newString([BII)Ljava/lang/String;
<ffffac6b2200>  !j9method 0x0000FFFFAC1FF890   java/lang/Long.toString(JI)Ljava/lang/String;
<ffffac6b2200>  !j9method 0x0000FFFF08031C50   java/math/BigInteger.smallToString(ILjava/lang/StringBuilder;I)V
<ffffac6b2200>  !j9method 0x0000FFFF08031C70   java/math/BigInteger.toString(Ljava/math/BigInteger;Ljava/lang/StringBuilder;II)V
<ffffac6b2200>  !j9method 0x0000FFFF08031C70   java/math/BigInteger.toString(Ljava/math/BigInteger;Ljava/lang/StringBuilder;II)V
<ffffac6b2200>  !j9method 0x0000FFFF08031C70   java/math/BigInteger.toString(Ljava/math/BigInteger;Ljava/lang/StringBuilder;II)V
<ffffac6b2200>  !j9method 0x0000FFFF08031C10   java/math/BigInteger.toString(I)Ljava/lang/String;
<ffffac6b2200>  !j9method 0x0000FFFFAC572D58   BigIntegerTest.stringConv()V
<ffffac6b2200>  !j9method 0x0000FFFFAC572E58   BigIntegerTest.main([Ljava/lang/String;)V
<ffffac6b2200>  !j9method 0x0000FFFFAC3A04A8   java/lang/invoke/LambdaForm$DMH/0x00000000ac39fc90.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)V
<ffffac6b2200>  !j9method 0x0000FFFF00011DB8   java/lang/invoke/LambdaForm$MH/0x00000000000115f0.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<ffffac6b2200>  !j9method 0x0000FFFF000129B8   java/lang/invoke/LambdaForm$MH/0x00000000000120f0.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<ffffac6b2200>  !j9method 0x0000FFFFAC1DA6F8   jdk/internal/reflect/DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<ffffac6b2200>  !j9method 0x0000FFFFAC1DA6B8   jdk/internal/reflect/DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<ffffac6b2200>  !j9method 0x0000FFFFAC1D2478   java/lang/reflect/Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<ffffac6b2200>  !j9method 0x0000FFFFAC61ACE8   com/sun/javatest/regtest/agent/MainActionHelper$AgentVMRunnable.run()V
<ffffac6b2200>  !j9method 0x0000FFFFAC1DD458   java/lang/Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V
<ffffac6b2200>  !j9method 0x0000FFFFAC1DD438   java/lang/Thread.run()V
<ffffac6b2200>                          JNI call-in frame
<ffffac6b2200>                          Native method frame
knn-k commented 1 year ago

The PC value points to this JIT method. Its size is over 70 KB.

        start=0xffff72f9c86c  end=0xffff72fae0f0   java/math/BigInteger::smallToString(ILjava/lang/StringBuilder;I)V
knn-k commented 1 year ago

3 failures in a 30x Grinder job for jdk_math_1, every time in BigIntegerTest.

Akira1Saitoh commented 1 year ago

Looks like the value in x28 was wrong when calling out to helper.

<ffffac6b2200> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x0000FFFFAC6B2200 ***
<ffffac6b2200>  ITERATE_O_SLOTS
<ffffac6b2200>  RECORD_BYTECODE_PC_OFFSET
<ffffac6b2200> Initial values: walkSP = 0x0000FFFF00058008, PC = 0x0000000000000005, literals = 0x0000000000000000, A0 = 0x0000FFFF00058028, j2iFrame = 0x0000000000000000, ELS = 0x0000FFFF38EEE590, decomp = 0x0000000000000000
<ffffac6b2200> JIT resolve frame: bp = 0x0000FFFF00058028, sp = 0x0000FFFF00058008, pc = 0x0000000000000005, cp = 0x0000000000000000, arg0EA = 0x0000FFFF00058028, flags = 0x0000000000B80000
<ffffac6b2200>  Object push (savedJITException)
<ffffac6b2200>          O-Slot[0x0000FFFF00058008] = 0x0000000000000000
<ffffac6b2200>  Allocation resolve
<ffffac6b2200>  unwindSP initialized to 0x0000FFFF00058030
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE240] = UDATA(0x0000FFFFAC198700) (jit_r0)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE248] = UDATA(0x0000000000000013) (jit_r1)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE250] = UDATA(0x0000FFFFABFF9860) (jit_r2)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE258] = UDATA(0x0000000000000003) (jit_r3)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE260] = UDATA(0x0000FFFFABFFEE68) (jit_r4)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE268] = UDATA(0x0000000000000013) (jit_r5)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE270] = UDATA(0x000000000000005C) (jit_r6)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE278] = UDATA(0x0000000000000013) (jit_r7)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE280] = UDATA(0x0000FFFFA9D80000) (jit_r8)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE288] = UDATA(0x0000FFFFAC198700) (jit_r9)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE290] = UDATA(0x0000FFFFA9D800D0) (jit_r10)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE298] = UDATA(0x000000000000000E) (jit_r11)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2A0] = UDATA(0x0000FFFFAC198700) (jit_r12)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2A8] = UDATA(0x0000000000000034) (jit_r13)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2B0] = UDATA(0x0000FFFFAC000000) (jit_r14)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2B8] = UDATA(0x0000000000000033) (jit_r15)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2C0] = UDATA(0x0000FFFFB1B85348) (jit_r16)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2C8] = UDATA(0x0000FFFFB34D4950) (jit_r17)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2D0] = UDATA(0xFFFFFFFFFFFFFFF9) (jit_r18)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2D8] = UDATA(0x0000FFFFAC6B2200) (jit_r19)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2E0] = UDATA(0x0000FFFF00058030) (jit_r20)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2E8] = UDATA(0x0000000000000041) (jit_r21)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2F0] = UDATA(0x0000FFFFAC000018) (jit_r22)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE2F8] = UDATA(0x000000000000000E) (jit_r23)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE300] = UDATA(0x0000000000000013) (jit_r24)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE308] = UDATA(0x0000000000000216) (jit_r25)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE310] = UDATA(0x0000FFFFABFFFFF8) (jit_r26)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE318] = UDATA(0x0000000000000020) (jit_r27)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE320] = UDATA(0x0000FFFF00058344) (jit_r28)   <---
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE328] = UDATA(0x0000000000000003) (jit_r29)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE330] = UDATA(0x0000FFFF72FAC800) (jit_r30)
<ffffac6b2200>  JIT-Resolve-RegisterMap[0x0000FFFF38EEE338] = UDATA(0x0000000000000000) (jit_r31)
<ffffac6b2200> JIT inline frame: bp = 0x0000FFFF00058A08, pc = 0x0000FFFF72FAC800, unwindSP = 0x0000FFFF00058030, cp = 0x0000FFFFAC22B930, arg0EA = 0x0000000000000000, jitInfo = 0x0000FFFF7063EEB8
<ffffac6b2200>  Method: java/lang/StringLatin1.newString([BII)Ljava/lang/String; !j9method 0x0000FFFFAC22B890
<ffffac6b2200>  Bytecode index = 7, inlineDepth = 2, PC offset = 0x0000000000087F20
<ffffac6b2200> JIT inline frame: bp = 0x0000FFFF00058A08, pc = 0x0000FFFF72FAC800, unwindSP = 0x0000FFFF00058030, cp = 0x0000FFFFAC2000D0, arg0EA = 0x0000000000000000, jitInfo = 0x0000FFFF7063EEB8
<ffffac6b2200>  Method: java/lang/Long.toString(JI)Ljava/lang/String; !j9method 0x0000FFFFAC1FF890
<ffffac6b2200>  Bytecode index = 129, inlineDepth = 1, PC offset = 0x00000000000ABF2C
<ffffac6b2200> JIT frame: bp = 0x0000FFFF00058A08, pc = 0x0000FFFF72FAC800, unwindSP = 0x0000FFFF00058030, cp = 0x0000FFFF0802F4F0, arg0EA = 0x0000FFFF00058A28, jitInfo = 0x0000FFFF7063EEB8
<ffffac6b2200>  Method: java/math/BigInteger.smallToString(ILjava/lang/StringBuilder;I)V !j9method 0x0000FFFF08031C50
<ffffac6b2200>  Bytecode index = 129, inlineDepth = 0, PC offset = 0x000000000000FF94
<ffffac6b2200>  stackMap=0x0000FFFF706491B0, slots=I16(0x0004) parmBaseOffset=I16(0x0010), parmSlots=U16(0x0003), localBaseOffset=I16(0xF9F0)
Akira1Saitoh commented 1 year ago

The pc value (0x0000FFFF72FAC800) shown in stacktrace is at OOL section generated by newObjectEvaluator.

(gdb) x/6i 0xffff72fac7f4
   0xffff72fac7f4:      mov     x9, x12
   0xffff72fac7f8:      mov     x0, x9
   0xffff72fac7fc:      bl      0xffff730ef970.       <------------- call out to jitNewObject
   0xffff72fac800:      mov     x9, x0
   0xffff72fac804:      mov     x12, x9
   0xffff72fac808:      b       0xffff72f9dc7c

The above OOL section is called from below code.

(gdb) x/32i 0xffff72f9dc50
   0xffff72f9dc50:      mov     x12, #0x8700                    // #34560
   0xffff72f9dc54:      movk    x12, #0xac19, lsl #16
   0xffff72f9dc58:      movk    x12, #0xffff, lsl #32
   0xffff72f9dc5c:      ldr     x9, [x19, #96]
   0xffff72f9dc60:      ldr     x14, [x19, #104]
   0xffff72f9dc64:      add     x22, x9, #0x20
   0xffff72f9dc68:      cmp     x22, x14
   0xffff72f9dc6c:      b.gt    0xffff72fac7f4        <------------ branch to OOL section
   0xffff72f9dc70:      str     x22, [x19, #96]
   0xffff72f9dc74:      str     x12, [x9]
   0xffff72f9dc78:      mov     x12, x9
   0xffff72f9dc7c:      add     w13, w11, w15
   0xffff72f9dc80:      str     x12, [x20, #1880]
   0xffff72f9dc84:      sub     w14, w13, w15
   0xffff72f9dc88:      str     w14, [x20, #652]
   0xffff72f9dc8c:      cmp     w14, #0x0
   0xffff72f9dc90:      b.lt    0xffff72fa511c  // b.tstop
   0xffff72f9dc94:      mov     x9, x3
   0xffff72f9dc98:      str     w13, [x20, #372]
   0xffff72f9dc9c:      mov     w23, #0x8                       // #8
   0xffff72f9dca0:      mov     w26, #0xffffffff                // #-1
   0xffff72f9dca4:      cmp     w14, w26
   0xffff72f9dca8:      b.hi    0xffff72fac7d8  // b.pmore
   0xffff72f9dcac:      ubfx    x26, x14, #0, #32
   0xffff72f9dcb0:      ldr     x22, [x19, #96]
   0xffff72f9dcb4:      add     x27, x26, #0x17
   0xffff72f9dcb8:      and     x27, x27, #0xfffffffffffffff8
   0xffff72f9dcbc:      ldr     x25, [x19, #104]
   0xffff72f9dcc0:      add     x26, x22, x27
   0xffff72f9dcc4:      cmp     x26, x25
   0xffff72f9dcc8:      b.gt    0xffff72fac7d8
   0xffff72f9dccc:      prfm    pstl3strm, [x26, #256]

x28 is set to x20(javaSP) to 0x314 at 0xffff72f9da38. Since unwindSP is 0x0000FFFF00058030, x28 is set to 0x0000FFFF00058030 + 0x314 = 0xffff00058344.

(gdb) x/20i 0xffff72f9da34
   0xffff72f9da34:      str     wzr, [x20, #804]
   0xffff72f9da38:      add     x28, x20, #0x314
   0xffff72f9da3c:      str     wzr, [x20, #808]
   0xffff72f9da40:      str     wzr, [x20, #812]
   0xffff72f9da44:      str     wzr, [x20, #816]
   0xffff72f9da48:      str     wzr, [x20, #820]
   0xffff72f9da4c:      str     wzr, [x20, #824]
   0xffff72f9da50:      str     wzr, [x20, #828]
   0xffff72f9da54:      str     wzr, [x20, #832]
   0xffff72f9da58:      str     wzr, [x20, #836]
   0xffff72f9da5c:      str     wzr, [x20, #840]
   0xffff72f9da60:      str     wzr, [x20, #844]
   0xffff72f9da64:      str     wzr, [x20, #848]
   0xffff72f9da68:      str     wzr, [x20, #852]
   0xffff72f9da6c:      str     wzr, [x20, #856]
   0xffff72f9da70:      str     wzr, [x20, #860]
   0xffff72f9da74:      str     wzr, [x20, #864]
   0xffff72f9da78:      str     wzr, [x20, #868]
   0xffff72f9da7c:      str     wzr, [x20, #872]
   0xffff72f9da80:      mov     w15, #0x40                      // #64
Akira1Saitoh commented 1 year ago

There seems to be a stack allocated byte array at the closest aligned location (0xffff00058340), which is similar to https://github.com/eclipse-openj9/openj9/issues/16510#issuecomment-1410777524.

> !j9object 0xffff00058340
!J9IndexableObject 0x0000FFFF00058340 {
    struct J9Class* clazz = !j9arrayclass 0xFFFFAC193700   // [B
    Object flags = 0x00000000;
    U_32 size = 0x00058530;
        [0] =   0, 0x00
        [1] =   0, 0x00
        [2] =   0, 0x00
        [3] =   0, 0x00
        [4] =   0, 0x00
        [5] =   0, 0x00
        [6] =   0, 0x00
        [7] =   0, 0x00
        [8] =   0, 0x00
        [9] =   0, 0x00
        [10] =   0, 0x00
        [11] =   0, 0x00
        [12] =   0, 0x00
        [13] =   0, 0x00
        [14] =   0, 0x00
        [15] =   0, 0x00
To print entire range: !j9indexableobject 0x0000FFFF00058340 0 361776
}
Akira1Saitoh commented 1 year ago

It seems that I can reliably reproduce the failure on ub20 machines. Trying to get a trace log for java/math/BigInteger.smallToString(ILjava/lang/StringBuilder;I)V, but no luck so far.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/9 jdk_math_0 java/math/BigInteger/BigIntegerTest.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/9/openjdk_test_output.tar.gz

19:42:29  rerun:
19:42:29  cd /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/scratch/0 && \
19:42:29  DISPLAY=:0 \
19:42:29  HOME=/home/jenkins \
19:42:29  LANG=en_US.UTF-8 \
19:42:29  PATH=/bin:/usr/bin:/usr/sbin \
19:42:29      /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/bin/java \
19:42:29          -Dtest.vm.opts='-ea -esa -Xmx512m -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:+UseCompressedOops' \
19:42:29          -Dtest.tool.vm.opts='-J-ea -J-esa -J-Xmx512m -J-Xdump:system:none -J-Xdump:heap:none -J-Xdump:system:events=gpf+abort+traceassert+corruptcache -J-XX:+UseCompressedOops' \
19:42:29          -Dtest.compiler.opts= \
19:42:29          -Dtest.java.opts= \
19:42:29          -Dtest.jdk=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image \
19:42:29          -Dcompile.jdk=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image \
19:42:29          -Dtest.timeout.factor=8.0 \
19:42:29          -Dtest.nativepath=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/openjdkbinary/openjdk-test-image/jdk/jtreg/native \
19:42:29          -Dtest.root=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/jdk \
19:42:29          -Dtest.name=java/math/BigInteger/BigIntegerTest.java \
19:42:29          -Dtest.file=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/jdk/java/math/BigInteger/BigIntegerTest.java \
19:42:29          -Dtest.src=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/jdk/java/math/BigInteger \
19:42:29          -Dtest.src.path=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/jdk/java/math/BigInteger:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/lib \
19:42:29          -Dtest.classes=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/java/math/BigInteger/BigIntegerTest.d \
19:42:29          -Dtest.class.path=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/java/math/BigInteger/BigIntegerTest.d:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/test/lib \
19:42:29          -Dtest.class.path.prefix=/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/java/math/BigInteger/BigIntegerTest.d:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/jdk/java/math/BigInteger:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/test/lib \
19:42:29          -classpath /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/java/math/BigInteger/BigIntegerTest.d:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/jdk/java/math/BigInteger:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_16890320549536/jdk_math_0/work/classes/0/test/lib:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/jvmtest/openjdk/jtreg/lib/jtreg.jar \
19:42:29          BigIntegerTest
19:42:29  STDERR:
19:42:29  Constructor: Passed
19:42:29  Prime: Passed
19:42:29  nextProbablePrime: Passed
19:42:29  Arithmetic I for 100 bits: Passed
19:42:29  Arithmetic II for 100 bits: Passed
19:42:29  Arithmetic I for 2760 bits: Passed
19:42:29  Arithmetic II for 2760 bits: Passed
19:42:29  Arithmetic I for 8000 bits: Passed
19:42:29  Arithmetic II for 8000 bits: Passed
19:42:29  divideAndRemainder for 100 bits: Passed
19:42:29  divideAndRemainder for 2760 bits: Passed
19:42:29  divideAndRemainder for 8000 bits: Passed
19:42:29  pow for 100 bits: Passed
19:42:29  pow for 2760 bits: Passed
19:42:29  pow for 8000 bits: Passed
19:42:29  square for 100 bits: Passed
19:42:29  square for 4200 bits: Passed
19:42:29  square for 7000 bits: Passed
19:42:29  squareRootSmall: Passed
19:42:29  squareRoot for 2^N and 2^N - 1, 1 <= N <= Double.MAX_EXPONENT: Passed
19:42:29  squareRoot for int: Passed
19:42:29  squareRoot for long: Passed
19:42:29  squareRoot for double: Passed
19:42:29  sqrtAndRemainder: Passed
19:42:29  Bit Count: Passed
19:42:29  BitLength: Passed
19:42:29  clearBit/testBit for 100 bits: Passed
19:42:29  flipBit/testBit for 100 bits: Passed
19:42:29  getLowestSetBit for 100 bits: Passed
19:42:29  Logic (^ | & ~) for 100 bits: Passed
19:42:29  Logic (&~ | ~) for 100 bits: Passed
19:42:29  baz shiftLeft for 100 bits: Passed
19:42:29  baz shifstderr:
19:42:29  0000000000377D00: Misaligned stack-allocated object in thread AgentVMThread
19:42:29  0000000000377D00: O-Slot=0000FFFF444FDDF0
19:42:29  0000000000377D00: O-Slot value=00000000004056C4
19:42:29  0000000000377D00: PC=0000FFFF7E909680
19:42:29  0000000000377D00: framesWalked=1
19:42:29  0000000000377D00: arg0EA=0000000000405ED8
19:42:29  0000000000377D00: walkSP=0000000000405278
19:42:29  0000000000377D00: literals=0000000000000000
19:42:29  0000000000377D00: jitInfo=0000FFFF442C0038
19:42:29  0000000000377D00: method=000000000032D950 (java/math/BigInteger.smallToString(ILjava/lang/StringBuilder;I)V) (JIT)
19:42:29  0000000000377D00: stack=0000000000400C78-0000000000406490
19:42:29  23:42:28.683 0x377d00    j9mm.479    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK17_aarch64_linux_Nightly/openj9/runtime/gc_glue_java/ScavengerRootScanner.hpp:109: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
knn-k commented 1 year ago

It is possible this failure is related to the Compact Strings feature, which was recently enabled by #17703. I ran 2 Grinder jobs below (30x each) disabling the feature (-XX:-CompactStrings option) on ub20-aarch64-5, and got no failures. 2582 2583

It can explain why we didn't see this failure before.

Akira1Saitoh commented 1 year ago

I think that the reason why the problem disappears with -XX:-CompactStrings is simply because the code path that fails is guarded by if (COMPACT_STRINGS).

        if (COMPACT_STRINGS) {
            byte[] buf = new byte[65];
            int charPos = 64;
            boolean negative = (i < 0);

            if (!negative) {
                i = -i;
            }

            while (i <= -radix) {
                buf[charPos--] = (byte)Integer.digits[(int)(-(i % radix))];
                i = i / radix;
            }
            buf[charPos] = (byte)Integer.digits[(int)(-i)];

            if (negative) {
                buf[--charPos] = '-';
            }
            return StringLatin1.newString(buf, charPos, (65 - charPos));    <-- The failure occurs when calling out to helper inside this method
        }

https://github.com/ibmruntimes/openj9-openjdk-jdk20/blob/47ad18f4c518c57b51b75d06a715c8c8a7d7f942/src/java.base/share/classes/java/lang/Long.java#L146-L165

Akira1Saitoh commented 1 year ago

I got another failure on internal Grinder /job/Grinder/33877/.

[2023-07-13T05:09:43.652Z] 0000FFFF885CC000: Misaligned stack-allocated object in thread AgentVMThread
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    O-Slot=0000FFFED00139C0
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    O-Slot value=0000FFFED001300C
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    PC=0000FFFF4D5F043C
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    framesWalked=3
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    arg0EA=0000FFFED00139F8
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    walkSP=0000FFFED0012AD8
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    literals=0000000000000000
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    jitInfo=0000FFFF46A7EFB8
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    method=0000FFFED8007830 (java/math/BigInteger.toString(Ljava/math/BigInteger;Ljava/lang/StringBuilder;II)V) (JIT)
[2023-07-13T05:09:43.652Z] 0000FFFF885CC000:    stack=0000FFFED000E550-0000FFFED0013D70

0000FFFED00139C0 seem to be a stack allocated pinning array for an internal pointer.

<ffff885cc000> JIT frame: bp = 0x0000FFFED00139D8, pc = 0x0000FFFF4D5F043C, unwindSP = 0x0000FFFED0012BB0, cp = 0x0000FFFED80051
30, arg0EA = 0x0000FFFED00139F8, jitInfo = 0x0000FFFF46A7EFB8
<ffff885cc000>  Method: java/math/BigInteger.toString(Ljava/math/BigInteger;Ljava/lang/StringBuilder;II)V !j9method 0x0000FFFED8
007830
<ffff885cc000>  Bytecode index = 35, inlineDepth = 0, PC offset = 0x0000000000001660
<ffff885cc000>  stackMap=0x0000FFFF46A8292D, slots=I16(0x0004) parmBaseOffset=I16(0x0018), parmSlots=U16(0x0002), localBaseOffse
t=I16(0xF6F8)
<ffff885cc000>  Described JIT args starting at 0x0000FFFED00139F0 for U16(0x0002) slots
<ffff885cc000>          O-Slot: : a1[0x0000FFFED00139F0] = 0x0000FFFF86680058
<ffff885cc000>          O-Slot: : a0[0x0000FFFED00139F8] = 0x0000FFFF86680078
<ffff885cc000>  Described JIT temps starting at 0x0000FFFED00130D0 for IDATA(0x0000000000000121) slots
<ffff885cc000> Address 0x0000FFFF46A8EFDB
<ffff885cc000> Num internal ptr map bytes U8(0x0C)
<ffff885cc000> Address 0x0000FFFF46A8EFDC
<ffff885cc000> Index of first internal ptr I16(0x011D)
<ffff885cc000> Address 0x0000FFFF46A8EFDE
<ffff885cc000> Offset of first internal ptr I16(0xFFD0)
<ffff885cc000> Address 0x0000FFFF46A8EFE0
<ffff885cc000> Num distinct pinning arrays U8(0x05)
<ffff885cc000> Before object slot walk &address : 0x0000FFFED00139C8 address : 0x0000FFFF866800A8 bp 0x0000FFFED00139D8 offset of first internal ptr I16(0xFFD0)
...
<ffff885cc000> Before object slot walk &address : 0x0000FFFED00139C0 address : 0x0000FFFED001300C bp 0x0000FFFED00139D8 offset of first internal ptr I16(0xFFD0)    <--- 
<ffff885cc000> After object slot walk for pinning array with &address : 0x0000FFFED00139C0 old address 0x0000FFFED001300C new address 0x0000FFFED001300C displacement IDATA(0x0000000000000000)
<ffff885cc000> For pinning array U8(0x03) num internal pointer stack slots U8(0x00)
...
<ffff885cc000>          I-Slot: : t3[0x0000FFFED00139B8] = 0x0000FFFF668909A0
<ffff885cc000>          I-Slot: : t2[0x0000FFFED00139C0] = 0x0000FFFED001300C    <---
<ffff885cc000>          I-Slot: : t1[0x0000FFFED00139C8] = 0x0000FFFF866800A8
<ffff885cc000>          I-Slot: : t0[0x0000FFFED00139D0] = 0x0000FFFF866800B8

The gdb backtrace tells that the crash occurs at walkJITFrameSlotsForInternalPointers(). It seems that a stack allocated array is mapped at unaligned address for some reason.

(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x0000ffff8c5afd10 in omrdump_create () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9prt29.so
...
#11 0x0000ffff8c318d80 in javaTrace () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9trc29.so
#12 0x0000ffff86e65b60 in walkJITFrameSlotsForInternalPointers ()
   from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9jit29.so
#13 0x0000ffff872c2eac in walkJITFrameSlots.constprop.0 () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9jit29.so
#14 0x0000ffff872c32d4 in jitWalkFrame () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9jit29.so
#15 0x0000ffff872c42a4 in jitWalkStackFrames () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9jit29.so
#16 0x0000ffff8c689460 in walkStackFrames () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9vm29.so
#17 0x0000ffff868819d4 in GC_VMThreadStackSlotIterator::scanSlots(J9VMThread*, J9VMThread*, void*, void (*)(J9JavaVM*, J9Object**, void*, J9StackWalkState*, void const*), bool, bool) ()
   from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9gc_full29.so
...
#32 0x0000ffff8696af34 in OMR_GC_AllocateObject(OMR_VMThread*, MM_AllocateInitialization*) ()
   from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9gc_full29.so
#33 0x0000ffff8688748c in J9AllocateObject () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9gc_full29.so
#34 0x0000ffff872b30bc in old_slow_jitNewObject () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9jit29.so
#35 0x0000ffff872c6050 in jitNewObject () from /home/jenkins/saitoh/17751/33877/jdk/lib/default/libj9jit29.so
Akira1Saitoh commented 1 year ago

I am inclined to think that the root cause of this problem is that the code for mapping locals on stack (J9::ARM64::PrivateLinkage::mapStack) does not properly align locals with size > 8 (such as stack allocated objects). This code is exercised when CompactLocals optimization does not kick in. When CompactLocals kicks in, OMR::ARM64::Linkage::mapCompactedStack is called and the code in question does not run. To support my theory, I was able to reproduce the failure with CompactLocals disabled. https://github.com/eclipse-openj9/openj9/blob/674096f441933f2a7cef9c172612a874e85ab6bd/runtime/compiler/aarch64/codegen/ARM64PrivateLinkage.cpp#L353-L362

Also, I added a debug assert in the loop and the assert actually hit. /job/Grinder/34045

11:43:24  Assertion failed at /home/jenkins/workspace/Build_JDK17_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/ARM64PrivateLinkage.cpp:358: (localCursor->getSize() < 8) || ((stackIndex & 7) == 0)
11:43:24  VMState: 0x0005ff06
11:43:24    mapSingleAutomatic: stackIndex (-1540) must be 8 byte aligned for size (80)

I will update the mapStack function to properly align locals with size > 8.

Akira1Saitoh commented 1 year ago

Opened https://github.com/eclipse-openj9/openj9/pull/17825.

Akira1Saitoh commented 1 year ago

30x Grinder for jdk_math_0 and jdk_math_1 passed with the change in #17825. (/job/Grinder/34058/)

knn-k commented 1 year ago

It is almost 2 weeks since PR #17825 was merged. I think this issue can be closed now.