Closed knn-k closed 2 months ago
Example 1 - Shift a value by 0 bits: (Seen only in a traceCG file in the next comment)
------------------------------
n73616n ( 0) treetop [ 0x12b14d250] bci=[191,12,255] rc=0 vc=8 vn=- li=3696 udi=- nc=1
n73617n ( 2) ishl [ 0x12b14d2a0] bci=[191,11,255] rc=2 vc=8 vn=- li=3696 udi=- nc=2
n73591n ( 2) ==>iadd (in GPR_4663)
n73736n ( 3) ==>iconst 0 (X==0 X>=0 X<=0 )
------------------------------
Example 2 - Shift a constant value by 0 bits: (Seen only in a traceCG file in the next comment)
------------------------------
n73622n ( 0) iRegStore x8 (privatizedInlinerArg ) [ 0x12b14d430] bci=[191,28,257] rc=0 vc=8 vn=- li=3696 udi=- nc=1 flg=0x2000
n73623n ( 5) iadd [ 0x12b14d480] bci=[191,28,257] rc=5 vc=8 vn=- li=3696 udi=- nc=2
n73624n ( 1) ishl [ 0x12b14d4d0] bci=[191,27,257] rc=1 vc=8 vn=- li=3696 udi=- nc=2
n73625n ( 1) iconst 2 (X!=0 X>=0 ) [ 0x12b14d520] bci=[191,22,257] rc=1 vc=8 vn=- li=3696 udi=- nc=0 flg=0x104
n73736n ( 2) ==>iconst 0 (X==0 X>=0 X<=0 )
n73595n ( 5) ==>iloadi (in GPR_4664) (X>=0 cannotOverflow )
------------------------------
Example 3 - Shift constant 0:
------------------------------
n73170n ( 0) ArrayCopyBNDCHK [#1] [ 0x12b1446f0] bci=[200,16,5251] rc=0 vc=8 vn=- li=3664 udi=- nc=2
n73171n ( 3) ishl [ 0x12b144740] bci=[200,16,5251] rc=3 vc=8 vn=- li=3664 udi=- nc=2
n73169n ( 3) ==>iconst 0 (X==0 X>=0 X<=0 )
n73150n ( 4) ==>iloadi (in GPR_4865) (cannotOverflow )
n73169n ( 3) ==>iconst 0 (X==0 X>=0 X<=0 )
------------------------------
A traceCG file for the method on AArch64 macOS with MathLoadTest_bigdecimal_CS_5m_0: getCtor.txt.gz
I tried to generate a full trace file, but I couldn't make the test fail.
I got a traceFull file after all for the method, but it is too large (33MB after compressed by gzip) to attach here.
JIT compilation failed with EXCEPTION THROWN (std::bad_alloc)
during the code generation. That seems to be the reason why I couldn't make the test fail when I was running with traceFull.
There are many ArrayCopyBNDCHK
nodes with ishl
that is shown as "Example 3" above.
n26493n ArrayCopyBNDCHK [#1] [ 0x145aa48d0] bci=[171,16,5251] rc=0 vc=10 vn=- li=1580 udi=- nc=2
n18812n ishl [ 0x14246e7e0] bci=[171,16,5251] rc=3 vc=10 vn=- li=1580 udi=- nc=2
n26491n ==>iconst 0
n18670n ==>iloadi
n26491n ==>iconst 0
The ArrayCopyBNDCHK
node above is for the System.arraycopy()
in String.getBytes()
, where n26491n
is srcPos
and n18670n
is coder
in the Java code.
https://github.com/ibmruntimes/openj9-openjdk-jdk17/blob/217e7d19465aba6a73ccbe34ec372546340362f5/src/java.base/share/classes/java/lang/String.java#L5251
This ArrayCopyBNDCHK
node can be removed because it compares 0 with 0.
I tried running MathLoadTest_bigdecimal_CS_5m_0
on ppc64le Linux with the this option: -Xjit:{java/lang/Class.getConstructor(*}(traceFull,log=getConstructor.txt,optlevel=scorching),scratchSpaceLimitKBForHotCompilations=2097152
It failed in the optimization phase with bad_alloc
as shown below. The size of this JIT trace file is as large as 215MB.
It is unknown what the final IL tree would be like.
<optimization id=112 name=partialRedundancyElimination method=java/lang/Class.getConstructor([Ljava/lang/Class;)Ljava/lang/reflect/Constructor;>
Performing 112: partialRedundancyElimination
=== EXCEPTION THROWN (std::bad_alloc) ===
</jitlog>
I got a traceFull log without bad_alloc
on AArch64 macOS.
https://na.artifactory.swg-devops.com/artifactory/sys-rt-generic-local/hyc-runtimes-jenkins.swg-devops.com/Grinder/41058/system_test_output.tar.gz
It contains ArrayCopyBNDCHK
nodes like this that can be optimized out. (Example 3 above)
------------------------------
n22856n ( 0) ArrayCopyBNDCHK [#1] [ 0x130f8d7f0] bci=[106,16,5251] rc=0 vc=10 vn=- li=989 udi=- nc=2
n12022n ( 3) ishl [ 0x124965d80] bci=[106,16,5251] rc=3 vc=10 vn=- li=989 udi=- nc=2
n22854n ( 4) ==>iconst 0 (X==0 X>=0 X<=0 )
n11880n ( 3) ==>iloadi (in GPR_1067) (cannotOverflow )
n22854n ( 4) ==>iconst 0 (X==0 X>=0 X<=0 )
------------------------------
The ishl
node n12022n
first appears after "id=15 name=inlining" as one of child nodes of a call to arraycopy, and the node does not get optimized.
n12030n call java/lang/System.arraycopy(Ljava/lang/Object;ILjava/lang/Object;II)V[#1967 final native static Method] [flags 0x20500 0x0 ] () [ 0x124966000] bci=[106,27,5251] rc=1 vc=42 vn=- li=- udi=- nc=5 flg=0x20
n12019n aloadi java/lang/String.value [B[#1095 final Shadow +8] [flags 0x400a0607 0x0 ] [ 0x124965c90] bci=[106,10,5251] rc=1 vc=0 vn=- li=- udi=- nc=1
n12018n aload <temp slot 5>[#1933 Auto] [flags 0x7 0x0 ] (X!=0 ) [ 0x124965c40] bci=[106,9,5251] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x4
n12022n ishl [ 0x124965d80] bci=[106,16,5251] rc=1 vc=0 vn=- li=- udi=- nc=2
n12167n iconst 0 (X==0 X>=0 X<=0 ) [ 0x124968ad0] bci=[100,3,1724] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n12021n iload <temp slot 9>[#1979 Auto] [flags 0x20000003 0x0 ] [ 0x124965d30] bci=[106,14,5251] rc=1 vc=0 vn=- li=- udi=- nc=0
n12023n aload <temp slot 8>[#1978 Auto] [flags 0x20000007 0x0 ] [ 0x124965dd0] bci=[106,17,5251] rc=1 vc=0 vn=- li=- udi=- nc=0
(Omitted other child nodes)
There is an ishl
node that shifts by 0 bits (Example 1 above) in the traceFull file, but that is simplified in "id=46 name=treeSimplification" as expected.
Trees after loopCanonicalization (id=45):
n22937n ArrayCopyBNDCHK [#1] [ 0x130f8f140] bci=[143,21,5251] rc=0 vc=8155 vn=- li=- udi=- nc=2
n14961n ishl [ 0x1300f3430] bci=[143,21,5251] rc=4 vc=8155 vn=- li=- udi=- nc=2
n14959n iload <temp slot 6>[#2294 Auto] [flags 0x3 0x0 ] (cannotOverflow ) [ 0x1300f3390] bci=[143,18,5251] rc=1 vc=8155 vn=- li=- udi=2038 nc=0 flg=0x1000
n14960n iconst 0 (X==0 X>=0 X<=0 cannotOverflow ) [ 0x1300f33e0] bci=[143,19,5251] rc=1 vc=8155 vn=- li=- udi=- nc=0 flg=0x1302
n22936n iconst 0 (X==0 X>=0 X<=0 ) [ 0x130f8f0f0] bci=[143,27,5251] rc=1 vc=8155 vn=- li=- udi=- nc=0 flg=0x302
Trees after treeSimplification (id=46):
n22937n ArrayCopyBNDCHK [#1] [ 0x130f8f140] bci=[143,21,5251] rc=0 vc=8169 vn=- li=- udi=- nc=2
n14959n ==>iload
n22936n iconst 0 (X==0 X>=0 X<=0 ) [ 0x130f8f0f0] bci=[143,27,5251] rc=1 vc=8169 vn=- li=- udi=- nc=0 flg=0x302
It is unknown why a node of this form remained unoptimized in the trace file in Issue #19525..
@hzongaro FYI. Can anybody in the optimizer team look at the Example 3 case? Thanks.
It is unknown why a node of this form remained unoptimized in the trace file in Issue https://github.com/eclipse-openj9/openj9/issues/19525..
@knn-k, I wonder whether the shift was introduced or the second operand was folded to zero after the last pass of Tree Simplifier or Value Propagation.
@dylanjtuttle, may I ask you to take a look at the opportunity illustrated in Example 3 that's been missed by the optimizer? This is something that the various handlers for shift operations in OMRSimplifierHandlers.cpp
and VPHandlers.cpp
need to be enhanced to catch.
Related issue: #17740 (already closed)
Closing this issue as https://github.com/eclipse/omr/pull/7398 has been merged.
This issue comes from #19525.
A JIT trace file for
java/lang/Class.getConstructor([Ljava/lang/Class;)Ljava/lang/reflect/Constructor;
at the scorching level contains someishl
nodes that can be easily simplified.