eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.26k stars 717 forks source link

Inconsistent execution results after System.arraycopy #19247

Open Brad0309 opened 4 months ago

Brad0309 commented 4 months ago

Java -version output

openjdk version "11.0.22" 2024-01-16
IBM Semeru Runtime Open Edition 11.0.22.0 (build 11.0.22+7)
Eclipse OpenJ9 VM 11.0.22.0 (build openj9-0.43.0, JRE 11 Mac OS X amd64-64-Bit Compressed References 20240131_864 (JIT enabled, AOT enabled)
OpenJ9   - 2c3d78b48
OMR      - ea8124dbc
JCL      - 7876cac747 based on jdk-11.0.22+7)

openjdk version "17.0.10" 2024-01-16
IBM Semeru Runtime Open Edition 17.0.10.0 (build 17.0.10+7)
Eclipse OpenJ9 VM 17.0.10.0 (build openj9-0.43.0, JRE 17 Mac OS X amd64-64-Bit Compressed References 20240116_636 (JIT enabled, AOT enabled)
OpenJ9   - 2c3d78b48
OMR      - ea8124dbc
JCL      - 2aad089841f based on jdk-17.0.10+7)

openjdk version "21.0.2" 2024-01-16 LTS
IBM Semeru Runtime Open Edition 21.0.2.0 (build 21.0.2+13-LTS)
Eclipse OpenJ9 VM 21.0.2.0 (build openj9-0.43.0, JRE 21 Mac OS X amd64-64-Bit Compressed References 20240116_95 (JIT enabled, AOT enabled)
OpenJ9   - 2c3d78b48
OMR      - ea8124dbc
JCL      - 78c4500a434 based on jdk-21.0.2+13)

openjdk version "1.8.0_402"
IBM Semeru Runtime Open Edition (build 1.8.0_402-b06)
Eclipse OpenJ9 VM (build openj9-0.43.0, JRE 1.8.0 Mac OS X amd64-64-Bit Compressed References 20240131_892 (JIT enabled, AOT enabled)
OpenJ9   - 2c3d78b48
OMR      - ea8124dbc
JCL      - 0fa9d9c532 based on jdk8u402-b06)

Summary of problem

public class ArrayCopyTest {

    public static int[] src1 = new int[1];
    public static int[] dst1 = new int[536870913];
    public static int sum = 0;
    public static void main(String[] var0) {
        for(int var1 = 0; var1 < 11000; var1++) {
            m1();
            m1();
        }
        System.out.println(sum);
    }
    public static void m1() {
        int var1 = 536870912;
        System.arraycopy(src1, 0, dst1, var1, 1);
        sum = checksum(sum, var1);
    }
    public static int checksum(int var0, int var1) {
        var0 += Integer.hashCode(var1);
        return Integer.hashCode(var0);
    }
}

The test program above produces different execution results across different runs. This seems to be a JIT bug since the inconsistent execution results do not occur when the -Xnojit option is specified. The test program and the reproduction script can be found at JITIssues-1.zip. Please replace the Java cmd path in test.sh with your Java cmd. This bug affects OpenJDK 8, OpenJDK 11, OpenJDK 17, and OpenJDK 21.

pshipton commented 4 months ago

@hzongaro fyi

hzongaro commented 4 months ago

@Brad0309, what differences in behaviour have you seen? I'm consistently seeing an OutOfMemoryError being thrown in my attempts to run the test.

pshipton commented 4 months ago
jdk8u402-b06/bin/java -Xmx8g ArrayCopyTest
536870912
jdk8u402-b06/bin/java -Xmx8g ArrayCopyTest
-1073741824
jdk8u402-b06/bin/java -Xmx8g ArrayCopyTest
0

With -Xint the result is 0.

hzongaro commented 4 months ago

Thanks, @pshipton!

@jmesyou, may I ask you to look at this problem?

knn-k commented 4 months ago

I cannot reproduce the problem with -Xjit:limit=m1,optlevel=noOpt, but with -Xjit:limit=m1,optlevel=cold. The trace file with the noOpt level has the node istore ArrayCopyTest.sum I, but at the cold level the node disappears in the local value propagation phase as the following log shows.

Performing 24: localValuePropagation
[    17] O^O VALUE PROPAGATION: Changing call call [000000011F544500] to arraycopy
[    18] O^O VALUE PROPAGATION: Removing rest of block after ArrayCopyBNDCHK [000000011F640230]
[    19] O^O VALUE PROPAGATION: Removing unreachable block_3 at [000000011F6006A0]
jmesyou commented 4 months ago

Thanks for taking a look @knn-k ! I am also able to reproduce the problem using the same parameters on HEAD. It is however quite intermittent, take > 100 tries sometimes to get the incorrect behavior.

jmesyou commented 4 months ago

@knn-k Do you happen to have a log from when you reproduced the issue? I've been able to reproduce the issue but when I turn logging on, it goes away!

knn-k commented 4 months ago

@jmesyou Attached please find my JIT trace files for noOpt and cold. 19247.zip There is no store to ArrayCopyTest.sum after localValuePropagation in m1-cold.txt.

I used the following command lines:

openjdk version "11.0.23-internal" 2024-04-16
OpenJDK Runtime Environment (build 11.0.23-internal+0-adhoc.kaz.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build master-4a7b54d13, JRE 11 Mac OS X aarch64-64-Bit 20240405_000000 (JIT enabled, AOT enabled)
OpenJ9   - 4a7b54d13
OMR      - 974bec3a2
JCL      - 0f9c69e64d based on jdk-11.0.23+7)
hzongaro commented 3 months ago

@dylanjtuttle, may I ask you to look at this problem?

dylanjtuttle commented 2 months ago

The

[    18] O^O VALUE PROPAGATION: Removing rest of block after ArrayCopyBNDCHK [000000011F640230]

trace message comes from OMR::ValuePropagation::mustTakeException(). This function calls OMR::ValuePropagation::removeRestOfBlock() which actually performs the (seemingly) erroneous removal of all trees after the current tree top.

constrainArrayCopyBndChk() calls mustTakeException when it has determined that "the exception will always be taken". From the context of the surrounding code, I am pretty sure that this means the compiler has placed value constraints on the ArrayCopyBNDCHK and its children, and those constraints indicate to the compiler that the check is guaranteed to fail, so it calls mustTakeException to remove what it thinks is dead code.

Some potential reasons this might be going wrong:

  1. The value numbering of the nodes is incorrect
  2. The constraints have been placed on the nodes incorrectly in some way
  3. The value numbers and constraints are correct, but the compiler is incorrectly interpreting them
  4. Some other reason that I haven't thought of

I'll look into these possibilities and see what I can figure out!

dylanjtuttle commented 2 months ago
n1n       BBStart <block_2> (freq 10000)                                                      [       0x11fb50190] bci=[-1,0,17] rc=0 vc=34 vn=- li=- udi=- nc=0
n4n       istore  <auto slot 0>[#352  Auto] [flags 0x3 0x0 ]                                  [       0x11fb50280] bci=[-1,2,17] rc=0 vc=34 vn=- li=- udi=- nc=1
n3n         iconst 0x20000000 (X!=0 X>=0 )                                                    [       0x11fb50230] bci=[-1,0,17] rc=2 vc=34 vn=- li=- udi=- nc=0 flg=0x104
n6n       ResolveCHK [#281]                                                                   [       0x11fb50320] bci=[-1,3,18] rc=0 vc=34 vn=- li=- udi=- nc=1
n5n         aload  ArrayCopyTest.src1 [I[#353  unresolved notAccessed volatile Static] [flags 0x2307 0x0 ]  [       0x11fb502d0] bci=[-1,3,18] rc=2 vc=34 vn=- li=- udi=- nc=0
n9n       ResolveCHK [#281]                                                                   [       0x11fb50410] bci=[-1,7,18] rc=0 vc=34 vn=- li=- udi=- nc=1
n8n         aload  ArrayCopyTest.dst1 [I[#354  unresolved notAccessed volatile Static] [flags 0x2307 0x0 ]  [       0x11fb503c0] bci=[-1,7,18] rc=2 vc=34 vn=- li=- udi=- nc=0
n53n      NULLCHK on n5n [#32]                                                                [       0x11fc4c050] bci=[-1,3,18] rc=0 vc=34 vn=- li=- udi=- nc=1
n52n        arraylength (stride 4) (X>=0 cannotOverflow )                                     [       0x11fc4c000] bci=[-1,3,18] rc=2 vc=34 vn=- li=- udi=- nc=1 flg=0x1100
n5n           ==>aload
n55n      NULLCHK on n8n [#32]                                                                [       0x11fc4c0f0] bci=[-1,7,18] rc=0 vc=34 vn=- li=- udi=- nc=1
n54n        arraylength (stride 4) (X>=0 cannotOverflow )                                     [       0x11fc4c0a0] bci=[-1,7,18] rc=2 vc=34 vn=- li=- udi=- nc=1 flg=0x1100
n8n           ==>aload
n57n      ArrayCopyBNDCHK [#1]                                                                [       0x11fc4c190] bci=[-1,3,18] rc=0 vc=34 vn=- li=- udi=- nc=2
n56n        isub (cannotOverflow )                                                            [       0x11fc4c140] bci=[-1,3,18] rc=1 vc=34 vn=- li=- udi=- nc=2 flg=0x1000
n52n          ==>arraylength
n11n          iconst 1 (X!=0 X>=0 )                                                           [       0x11fb504b0] bci=[-1,11,18] rc=2 vc=34 vn=- li=- udi=- nc=0 flg=0x104
n7n         iconst 0 (X==0 X>=0 X<=0 )                                                        [       0x11fb50370] bci=[-1,6,18] rc=1 vc=34 vn=- li=- udi=- nc=0 flg=0x302
n59n      ArrayCopyBNDCHK [#1]                                                                [       0x11fc4c230] bci=[-1,7,18] rc=0 vc=34 vn=- li=- udi=- nc=2
n58n        isub (cannotOverflow )                                                            [       0x11fc4c1e0] bci=[-1,7,18] rc=1 vc=34 vn=- li=- udi=- nc=2 flg=0x1000
n54n          ==>arraylength
n11n          ==>iconst 1
n3n         ==>iconst 0x20000000
n60n      return                                                                              [       0x11fc4c280] bci=[-1,7,18] rc=0 vc=34 vn=- li=- udi=- nc=0
n37n      BBEnd </block_2>                                                                    [       0x11fb50cd0] bci=[0,2,23] rc=0 vc=33 vn=- li=- udi=- nc=0

From this block, we can tell that two ArrayCopyBNDCHKs are performed: one on ArrayCopyTest.src1 and one on ArrayCopyTest.dst1. The first compares 0 with src1.length - 1, while the second compares 0x20000000 (or 536870912 in decimal) with dst1.length - 1.

public class ArrayCopyTest {

    public static int[] src1 = new int[1];
    public static int[] dst1 = new int[536870913];
    public static int sum = 0;
    public static void main(String[] var0) {
        for(int var1 = 0; var1 < 11000; var1++) {
            m1();
            m1();
        }
        System.out.println(sum);
    }
    public static void m1() {
        int var1 = 536870912;
        System.arraycopy(src1, 0, dst1, var1, 1);
        sum = checksum(sum, var1);
    }
    public static int checksum(int var0, int var1) {
        var0 += Integer.hashCode(var1);
        return Integer.hashCode(var0);
    }
}

As we can see from the test case, src1.length = 1 and dst1.length = 536870913, so both of our ArrayCopyBNDCHKs should pass no problem. However, when printing out the values of low and high in constrainArrayCopyBndChk(), I saw the following print statements:

low (0) <= high (536870910)
low (536870912) <= high (536870910)

It appears like the value of dst1.length - 1 is occasionally 536870910 instead of 536870912. I'm not entirely sure why the first ArrayCopyBNDCHK is comparing 0 to dst1.length - 1 instead of src1.length - 1, but I think the important takeaway is that for some reason, the length of dst1 is occasionally two elements shorter than it should be, causing our second ArrayCopyBNDCHK to fail and the optimizer to remove the trees that follow it.

This leads me to believe that the problem may actually be somewhere like constrainArraylength() instead. I will continue to investigate!

dylanjtuttle commented 2 months ago

I believe I have found the source of the error! On line 1957 of constrainAload():

if (elementSize != 0)
   {
   constraint = TR::VPClass::create(vp, (TR::VPClassType*)constraint, NULL, NULL,
         TR::VPArrayInfo::create(vp, 0, elementSize == 0 ? TR::getMaxSigned<TR::Int32>() : TR::getMaxSigned<TR::Int32>()/elementSize, elementSize),
         TR::VPObjectLocation::create(vp, TR::VPObjectLocation::NotClassObject));
   }

When we create the VPArrayInfo object, we pass a value of

elementSize == 0 ? TR::getMaxSigned<TR::Int32>() : TR::getMaxSigned<TR::Int32>()/elementSize

as our highBound. The ternary operator is redundant here because we already know that elementSize != 0, meaning our highBound is really just

TR::getMaxSigned<TR::Int32>()/elementSize

TR::getMaxSigned<TR::Int32>() = 2147483647 and elementSize = 4 here, so when you integer divide them, you obtain 536870911, which is our two-elements-too-short incorrect array length.

@hzongaro, do you know why we're doing this? Dividing the total size of the array by the element size seems like a reasonable way to obtain the length of an array, but why might we be using the maximum value of an Int32? Do you have any thoughts about what we should be doing instead to obtain a more accurate length?

hzongaro commented 2 months ago

Do you have any thoughts about what we should be doing instead to obtain a more accurate length?

This was probably an historical limitation. As @jdmpapin mentioned off-line, there are other places in VP that take the heap size into account in determining the maximum number of elements in an array. For instance, constrainArraylength uses TR::Compiler->om.maxArraySizeInElements.

I believe I have found the source of the error!

Good work! :-)

dylanjtuttle commented 1 month ago

It looks like replacing the old method of calculating the length with

TR::Compiler->om.maxArraySizeInElements(elementSize, vp->comp())

like in constrainArrayLength() does fix the problem. Printing the values of low and high in constrainArrayCopyBndChk() like before, we can see that the high bound is now more than large enough:

low (0) <= high (1048330239)
low (536870912) <= high (1048330239)

I have not been able to reproduce the bug since making this change, but since it was sometimes difficult to reproduce before I don't want to say that it is gone for certain.