eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

The mis-compilation phenomenon occurring at LastOptIndex 148 under JIT Hot Level #19131

Open Qeryu opened 7 months ago

Qeryu commented 7 months ago

Java -version output

openjdk version "1.8.0_412-internal"
OpenJDK Runtime Environment (build 1.8.0_412-internal-user_2024_03_13_09_46-b00)
Eclipse OpenJ9 VM (build master-b44ad1a5a, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20240313_000000 (JIT enabled, AOT enabled)
OpenJ9   - b44ad1a5a
OMR      - 66252485a
JCL      - 16aa7c2499 based on jdk8u412-b05)

Summary of problem

When I run the following code with the latest version of OpenJ9, a mis-compilation occurs.

class Test {
  long a;
  void b(int c) {
    int d = -9;
    int e;
    switch (c) {
    case 8:
      do {
        a += 2;
        for (e = d; 3 > e;)
          return;
      } while (++d < 1000);
    }
    Float.floatToIntBits(3);
  }
  void f(String[] g) {
    int h, i = h = 1;
    while (++h < 100000)
      b(h);
    i += a;
    System.out.println(i);
  }
  public static void main(String[] j) {
    Test k = new Test();
    for (int h = 0; h < 10; h++)
      k.f(j);
  }
}
# other jdk's output (including -Xint OpenJ9)
# also OpenJ9 with -Xjit:limit={Test.b*},lastOptIndex=147,optLevel=hot
3
5
7
9
11
13
15
17
19
21

# OpenJ9 with -Xjit:limit={Test.b*},lastOptIndex=148,optLevel=hot
3
5
13
21
29
37
45
53
61
69

Related log: jit.log.2752332.21123.20240313.171203.2752332.log

pshipton commented 7 months ago

@hzongaro fyi

hzongaro commented 7 months ago

Opt index 148 appears to be General Loop Unroller, in this case. @BradleyWood, may I ask you to take a look at this problem?

BradleyWood commented 7 months ago

GLU decided to unroll the do-while loop by a factor of 4 without residual. During unrolling, the body (block_3) got cloned 3 times, dropping if (3 > e) return and rewrote the control flow such that block_3 is the last of the unrolled sequence to execute.

n3n       BBStart <block_3> (freq 10000)                                                      [0x7f32a92af980] bci=[-1,24,9] rc=0 vc=855 vn=- li=- udi=- nc=0
n26n      lstorei  Test.a J[#428  notAccessed Shadow +8] [flags 0x604 0x0 ]                   [0x7f32a92b00b0] bci=[-1,33,9] rc=0 vc=859 vn=- li=- udi=- nc=2
n22n        aload  <'this' parm LTest;>[#422  Parm] [flags 0x40000107 0x0 ] (X!=0 X>=0 )      [0x7f32a92aff70] bci=[-1,24,9] rc=1 vc=859 vn=- li=- udi=12 nc=0 flg=0x104
n25n        lsub                                                                              [0x7f32a92b0060] bci=[-1,32,9] rc=2 vc=859 vn=- li=- udi=- nc=2
n23n          lload  <temp slot 7>[#445  Auto] [flags 0x4 0x0 ] (cannotOverflow createdByPRE )  [0x7f32a92affc0] bci=[-1,26,9] rc=1 vc=859 vn=- li=- udi=13 nc=0 flg=0x41000
n24n          lconst -2 (X!=0 X<=0 )                                                          [0x7f32a92b0010] bci=[-1,29,9] rc=1 vc=859 vn=- li=- udi=- nc=0 flg=0x204
n237n     lstore  <temp slot 7>[#445  Auto] [flags 0x4 0x0 ]                                  [0x7f32a934e9d0] bci=[-1,32,9] rc=0 vc=859 vn=- li=6 udi=3 nc=1
n25n        ==>lsub
n33n      ificmplt --> block_4 BBStart at n31n (swappedChildren )                             [0x7f32a92b02e0] bci=[-1,40,10] rc=0 vc=859 vn=- li=- udi=- nc=2 flg=0x20020
n27n        iload  <auto slot 2>[#424  Auto] [flags 0x3 0x0 ] (cannotOverflow )               [0x7f32a92b0100] bci=[-1,36,10] rc=1 vc=859 vn=- li=- udi=14 nc=0 flg=0x1000
n29n        iconst 3 (X!=0 X>=0 )                                                             [0x7f32a92b01a0] bci=[-1,38,10] rc=1 vc=859 vn=- li=- udi=- nc=0 flg=0x104
n4n       BBEnd </block_3> =====   

block_28, clone of block_3

n309n     BBStart <block_28> (freq 10000)                                                     [0x7f32a9350050] bci=[-1,24,9] rc=0 vc=855 vn=- li=- udi=- nc=0
n310n     lstorei  Test.a J[#428  notAccessed Shadow +8] [flags 0x604 0x0 ]                   [0x7f32a93500a0] bci=[-1,33,9] rc=0 vc=859 vn=- li=- udi=- nc=2
n311n       aload  <'this' parm LTest;>[#422  Parm] [flags 0x40000107 0x0 ] (X!=0 X>=0 )      [0x7f32a93500f0] bci=[-1,24,9] rc=1 vc=859 vn=- li=- udi=12 nc=0 flg=0x104
n312n       lsub                                                                              [0x7f32a9350140] bci=[-1,32,9] rc=2 vc=859 vn=- li=- udi=- nc=2
n313n         lload  <temp slot 7>[#445  Auto] [flags 0x4 0x0 ] (cannotOverflow createdByPRE )  [0x7f32a9350190] bci=[-1,26,9] rc=1 vc=859 vn=- li=- udi=13 nc=0 flg=0x41000
n314n         lconst -2 (X!=0 X<=0 )                                                          [0x7f32a93501e0] bci=[-1,29,9] rc=1 vc=859 vn=- li=- udi=- nc=0 flg=0x204
n315n     lstore  <temp slot 7>[#445  Auto] [flags 0x4 0x0 ]                                  [0x7f32a9350230] bci=[-1,32,9] rc=0 vc=859 vn=- li=6 udi=3 nc=1
n312n       ==>lsub
n319n     BBEnd </block_28> =====                                                             [0x7f32a9350370] bci=[-1,40,10] rc=0 vc=855 vn=- li=- udi=- nc=0

So I think the loop got rewritten from:

int d = -9;
...
do {
  a += 2;
  for (e = d; 3 > e;)
    return;
} while (++d < 1000);

to something like:

int d = -9;
...
do {
  a += 2;
  a += 2;
  a += 2;
  a += 2;
  for (e = d; 3 > e;)
    return;
} while (++d < 1000);
BradleyWood commented 7 months ago

So I think what happened is that unrolling attempted to remove the loop-back branch for each of the new unrolled blocks but accidentally removed the comparison of the inner-loop.