eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

Unoptimized ishl nodes remain in the "Pre Instruction Selection Trees" #19576

Closed knn-k closed 2 months ago

knn-k commented 4 months ago

This issue comes from #19525.

A JIT trace file for java/lang/Class.getConstructor([Ljava/lang/Class;)Ljava/lang/reflect/Constructor; at the scorching level contains some ishl nodes that can be easily simplified.

openjdk version "17.0.12-internal" 2024-07-16
OpenJDK Runtime Environment (build 17.0.12-internal+0-adhoc.jenkins.BuildJDK17aarch64macNightly)
Eclipse OpenJ9 VM (build master-fa143bf2f9f, JRE 17 Mac OS X aarch64-64-Bit 20240518_573 (JIT enabled, AOT enabled)
OpenJ9   - fa143bf2f9f
OMR      - 76296e11f30
JCL      - e583cf98922 based on jdk-17.0.12+3)
knn-k commented 4 months ago

Example 1 - Shift a value by 0 bits: (Seen only in a traceCG file in the next comment)

------------------------------
 n73616n  (  0)  treetop                                                                              [       0x12b14d250] bci=[191,12,255] rc=0 vc=8 vn=- li=3696 udi=- nc=1
 n73617n  (  2)    ishl                                                                               [       0x12b14d2a0] bci=[191,11,255] rc=2 vc=8 vn=- li=3696 udi=- nc=2
 n73591n  (  2)      ==>iadd (in GPR_4663)
 n73736n  (  3)      ==>iconst 0 (X==0 X>=0 X<=0 )
------------------------------

Example 2 - Shift a constant value by 0 bits: (Seen only in a traceCG file in the next comment)

------------------------------
 n73622n  (  0)  iRegStore x8  (privatizedInlinerArg )                                                [       0x12b14d430] bci=[191,28,257] rc=0 vc=8 vn=- li=3696 udi=- nc=1 flg=0x2000
 n73623n  (  5)    iadd                                                                               [       0x12b14d480] bci=[191,28,257] rc=5 vc=8 vn=- li=3696 udi=- nc=2
 n73624n  (  1)      ishl                                                                             [       0x12b14d4d0] bci=[191,27,257] rc=1 vc=8 vn=- li=3696 udi=- nc=2
 n73625n  (  1)        iconst 2 (X!=0 X>=0 )                                                          [       0x12b14d520] bci=[191,22,257] rc=1 vc=8 vn=- li=3696 udi=- nc=0 flg=0x104
 n73736n  (  2)        ==>iconst 0 (X==0 X>=0 X<=0 )
 n73595n  (  5)      ==>iloadi (in GPR_4664) (X>=0 cannotOverflow )
------------------------------

Example 3 - Shift constant 0:

------------------------------
 n73170n  (  0)  ArrayCopyBNDCHK [#1]                                                                 [       0x12b1446f0] bci=[200,16,5251] rc=0 vc=8 vn=- li=3664 udi=- nc=2
 n73171n  (  3)    ishl                                                                               [       0x12b144740] bci=[200,16,5251] rc=3 vc=8 vn=- li=3664 udi=- nc=2
 n73169n  (  3)      ==>iconst 0 (X==0 X>=0 X<=0 )
 n73150n  (  4)      ==>iloadi (in GPR_4865) (cannotOverflow )
 n73169n  (  3)    ==>iconst 0 (X==0 X>=0 X<=0 )
------------------------------
knn-k commented 4 months ago

A traceCG file for the method on AArch64 macOS with MathLoadTest_bigdecimal_CS_5m_0: getCtor.txt.gz

I tried to generate a full trace file, but I couldn't make the test fail.

knn-k commented 4 months ago

I got a traceFull file after all for the method, but it is too large (33MB after compressed by gzip) to attach here. JIT compilation failed with EXCEPTION THROWN (std::bad_alloc) during the code generation. That seems to be the reason why I couldn't make the test fail when I was running with traceFull.

There are many ArrayCopyBNDCHK nodes with ishl that is shown as "Example 3" above.

n26493n   ArrayCopyBNDCHK [#1]                                                                [       0x145aa48d0] bci=[171,16,5251] rc=0 vc=10 vn=- li=1580 udi=- nc=2
n18812n     ishl                                                                              [       0x14246e7e0] bci=[171,16,5251] rc=3 vc=10 vn=- li=1580 udi=- nc=2
n26491n       ==>iconst 0
n18670n       ==>iloadi
n26491n     ==>iconst 0
knn-k commented 4 months ago

The ArrayCopyBNDCHK node above is for the System.arraycopy() in String.getBytes(), where n26491n is srcPos and n18670n is coder in the Java code. https://github.com/ibmruntimes/openj9-openjdk-jdk17/blob/217e7d19465aba6a73ccbe34ec372546340362f5/src/java.base/share/classes/java/lang/String.java#L5251

This ArrayCopyBNDCHK node can be removed because it compares 0 with 0.

knn-k commented 4 months ago

I tried running MathLoadTest_bigdecimal_CS_5m_0 on ppc64le Linux with the this option: -Xjit:{java/lang/Class.getConstructor(*}(traceFull,log=getConstructor.txt,optlevel=scorching),scratchSpaceLimitKBForHotCompilations=2097152

It failed in the optimization phase with bad_alloc as shown below. The size of this JIT trace file is as large as 215MB. It is unknown what the final IL tree would be like.

<optimization id=112 name=partialRedundancyElimination method=java/lang/Class.getConstructor([Ljava/lang/Class;)Ljava/lang/reflect/Constructor;>
Performing 112: partialRedundancyElimination

=== EXCEPTION THROWN (std::bad_alloc) ===
</jitlog>

https://na.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Grinder/41057/system_test_output.tar.gz

knn-k commented 4 months ago

I got a traceFull log without bad_alloc on AArch64 macOS. https://na.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Grinder/41058/system_test_output.tar.gz

It contains ArrayCopyBNDCHK nodes like this that can be optimized out. (Example 3 above)

------------------------------
 n22856n  (  0)  ArrayCopyBNDCHK [#1]                                                                 [       0x130f8d7f0] bci=[106,16,5251] rc=0 vc=10 vn=- li=989 udi=- nc=2
 n12022n  (  3)    ishl                                                                               [       0x124965d80] bci=[106,16,5251] rc=3 vc=10 vn=- li=989 udi=- nc=2
 n22854n  (  4)      ==>iconst 0 (X==0 X>=0 X<=0 )
 n11880n  (  3)      ==>iloadi (in GPR_1067) (cannotOverflow )
 n22854n  (  4)    ==>iconst 0 (X==0 X>=0 X<=0 )
------------------------------

The ishl node n12022n first appears after "id=15 name=inlining" as one of child nodes of a call to arraycopy, and the node does not get optimized.

n12030n     call  java/lang/System.arraycopy(Ljava/lang/Object;ILjava/lang/Object;II)V[#1967  final native static Method] [flags 0x20500 0x0 ] ()  [       0x124966000] bci=[106,27,5251] rc=1 vc=42 vn=- li=- udi=- nc=5 flg=0x20
n12019n       aloadi  java/lang/String.value [B[#1095  final Shadow +8] [flags 0x400a0607 0x0 ]  [       0x124965c90] bci=[106,10,5251] rc=1 vc=0 vn=- li=- udi=- nc=1
n12018n         aload  <temp slot 5>[#1933  Auto] [flags 0x7 0x0 ] (X!=0 )                    [       0x124965c40] bci=[106,9,5251] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x4
n12022n       ishl                                                                            [       0x124965d80] bci=[106,16,5251] rc=1 vc=0 vn=- li=- udi=- nc=2
n12167n         iconst 0 (X==0 X>=0 X<=0 )                                                    [       0x124968ad0] bci=[100,3,1724] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n12021n         iload  <temp slot 9>[#1979  Auto] [flags 0x20000003 0x0 ]                     [       0x124965d30] bci=[106,14,5251] rc=1 vc=0 vn=- li=- udi=- nc=0
n12023n       aload  <temp slot 8>[#1978  Auto] [flags 0x20000007 0x0 ]                       [       0x124965dd0] bci=[106,17,5251] rc=1 vc=0 vn=- li=- udi=- nc=0
(Omitted other child nodes)
knn-k commented 4 months ago

There is an ishl node that shifts by 0 bits (Example 1 above) in the traceFull file, but that is simplified in "id=46 name=treeSimplification" as expected.

Trees after loopCanonicalization (id=45):

n22937n   ArrayCopyBNDCHK [#1]                                                                [       0x130f8f140] bci=[143,21,5251] rc=0 vc=8155 vn=- li=- udi=- nc=2
n14961n     ishl                                                                              [       0x1300f3430] bci=[143,21,5251] rc=4 vc=8155 vn=- li=- udi=- nc=2
n14959n       iload  <temp slot 6>[#2294  Auto] [flags 0x3 0x0 ] (cannotOverflow )            [       0x1300f3390] bci=[143,18,5251] rc=1 vc=8155 vn=- li=- udi=2038 nc=0 flg=0x1000
n14960n       iconst 0 (X==0 X>=0 X<=0 cannotOverflow )                                       [       0x1300f33e0] bci=[143,19,5251] rc=1 vc=8155 vn=- li=- udi=- nc=0 flg=0x1302
n22936n     iconst 0 (X==0 X>=0 X<=0 )                                                        [       0x130f8f0f0] bci=[143,27,5251] rc=1 vc=8155 vn=- li=- udi=- nc=0 flg=0x302

Trees after treeSimplification (id=46):

n22937n   ArrayCopyBNDCHK [#1]                                                                [       0x130f8f140] bci=[143,21,5251] rc=0 vc=8169 vn=- li=- udi=- nc=2
n14959n     ==>iload
n22936n     iconst 0 (X==0 X>=0 X<=0 )                                                        [       0x130f8f0f0] bci=[143,27,5251] rc=1 vc=8169 vn=- li=- udi=- nc=0 flg=0x302

It is unknown why a node of this form remained unoptimized in the trace file in Issue #19525..

knn-k commented 4 months ago

@hzongaro FYI. Can anybody in the optimizer team look at the Example 3 case? Thanks.

hzongaro commented 4 months ago

It is unknown why a node of this form remained unoptimized in the trace file in Issue https://github.com/eclipse-openj9/openj9/issues/19525..

@knn-k, I wonder whether the shift was introduced or the second operand was folded to zero after the last pass of Tree Simplifier or Value Propagation.

@dylanjtuttle, may I ask you to take a look at the opportunity illustrated in Example 3 that's been missed by the optimizer? This is something that the various handlers for shift operations in OMRSimplifierHandlers.cpp and VPHandlers.cpp need to be enhanced to catch.

knn-k commented 3 months ago

Related issue: #17740 (already closed)

knn-k commented 2 months ago

Closing this issue as https://github.com/eclipse/omr/pull/7398 has been merged.