Open 0xdaryl opened 5 years ago
FYI @fjeremic
I question why this belongs to the codegen in the first place? This seems like something that should be reduced to a TR::arraycopy
at the common optimizer level, rather than it being replicated across every codegen. Do you have any thoughts about this?
Looking into this deeper on x86 [1] and Power [2] it looks like we have duplicate code here in the codegen evaluators. It seems however there already is a common optimization which transforms these calls into TR:arraycopy
in VP [3].
Rather than reimplementing this on every codegen we should examine whether the current x86 and Power implementations in the codegen are needed or if the VP pass is enough to capture all the use cases. If it is not, we should likely focus on improving the VP pass, or alternatively moving this logic to J9RecognizedCallTransformer since it's a simple method to tree transformation that runs at IL gen time.
We can then deprecate the x86 and Power codegen implementations and benefit from this transformation on all codegens for free.
Of course, if the semantics of the primitive copy are identical then a cross-platform approach is always welcome.
@andrewcraik We compared the VP code and the cases in X86 and Power. I believe VP code has already covered what X86 and Power are doing. And we are now thinking of:
@andrewcraik Could you please help take a look at the actions mentioned above and see if they are doable?
So doing the transform in the recognized call transformer is likely doable. A general opt in VP to set the forward/backward flag is much more dangerous - arraycopy nodes come from a lot of places so while we should be able to set the flags in VP based on constraints there is a real risk of bugs being exposed and having to back off because of specialized use of the arraycopy (main ones in this category of fear in my mind are the ones from IdiomRecognition).
Care will need to be taken to check the details of the different codegens in how their arraycopy works and the flags they check and the impact they have. While this should be consistent there is a real risk of inconsistency leading to potential bugs.
Simply implementing the above and passing the tests is not going to be enough analysis will need to be done to try to ensure the transformations won't leave bugs lurking since this is a tricky area.
@andrewcraik We tried to add an assertion fail in https://github.com/eclipse/omr/blob/ffb0d9866e44551d22553be84adada6e546f0a19/compiler/optimizer/ValuePropagationCommon.cpp#L1024-L1025 And then we launched a personal build on all platforms. The result is that no job failed. We think it will have no impact on performance if we delete the constrain part of the VP code and migrate the recognition part to J9RecognizedCallTransformer, as we ran a huge amount of coverage testing and not once did we hit the constraint path in VP.
I don't think the fact it isn't seen in a personal build is a reason to drop the support of setting the direction flag. If the constraint handling for arraycopy can handle this then we should be good. If not that enhancement is needed.
I don't think the fact it isn't seen in a personal build is a reason to drop the support of setting the direction flag.
The code in VP would be dead if we migrate the transformation to J9RecognizedCallTransformer as VP will never see a call to a recognized method TR::sun_misc_Unsafe_copyMemory
during VP. As outlined previously, the risk of doing the constraint analysis on all TR::arraycopy
nodes is too risky, so we can't do that either. How do you suggest we proceed?
IMO if the constraint piece of code setting the arraycopy flag is not hit in all of our testing, how can we even be sure the constraint piece of code in VP works in the first place?
I believe this pr satisfies the purpose of this issue. In the mentioned pr, the transformation was moved from arch specific codeGen to runtime/compiler/codegen/J9CodeGenerator.cpp
. I tested calling Unsafe.CopyMemory
on s390x with disableLocalVP
and disableGlobalVP
options and collected jit logs:
in Post Optimization Trees:
<LowerTrees>
parent 000003FF6C1516F0 appears to be a nullcheck over node: 000003FF6C1516A0
parent 000003FF6C155A70 appears to be a nullcheck over node: 000003FF6C1559D0
parent 000003FF6C1570A0 appears to be a nullcheck over node: 000003FF6C157050
parent 000003FF6C21B430 appears to be a nullcheck over node: 000003FF6C21B390
[ 734] O^O Call arraycopy instead of Unsafe.copyMemory: 0x3ff6c14edb0
</LowerTrees>
in Post Lower Trees:
n394n BBStart <block_11> (freq 4502) [ 0x3ff6c14eae0] bci=[-1,7,31] rc=0 vc=1659 vn=- li=11 udi=- nc=1
n1726n GlRegDeps () [ 0x3ff6c548b40] bci=[-1,7,31] rc=1 vc=1659 vn=- li=11 udi=- nc=5 flg=0x20
n1727n aRegLoad GPR3 <parm 2 [I>[#399 Parm] [flags 0x40000107 0x0 ] (SeenRealReference ) [ 0x3ff6c548b90] bci=[1,22,1368] rc=2 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1728n aRegLoad GPR2 <parm 1 [I>[#398 Parm] [flags 0x40000107 0x0 ] (SeenRealReference ) [ 0x3ff6c548be0] bci=[1,20,1368] rc=2 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1729n lRegLoad GPR6 <parm 3 J>[#400 Parm] [flags 0xc0000104 0x0 ] (SeenRealReference ) [ 0x3ff6c548c30] bci=[1,21,1368] rc=3 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1730n aRegLoad GPR7 <temp slot 10>[#562 Auto] [flags 0x20004007 0x0 ] (X!=0 SeenRealReference ) [ 0x3ff6c548c80] bci=[1,19,1368] rc=1 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8004
n1731n lRegLoad GPR8 <parm 5 J>[#401 Parm] [flags 0xc0000104 0x0 ] (SeenRealReference ) [ 0x3ff6c548cd0] bci=[1,26,1368] rc=2 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1937n arraycopy <arraycopy>[#245 helper Method] [flags 0x400 0x0 ] () [ 0x3ff6c54cd30] bci=[1,20,1368] rc=0 vc=0 vn=- li=- udi=- nc=3 flg=0x20
n1935n aladd [ 0x3ff6c54cc90] bci=[1,20,1368] rc=1 vc=0 vn=- li=- udi=- nc=2
n1728n ==>aRegLoad
n1729n ==>lRegLoad
n1936n aladd [ 0x3ff6c54cce0] bci=[1,22,1368] rc=1 vc=0 vn=- li=- udi=- nc=2
n1727n ==>aRegLoad
n1729n ==>lRegLoad
n1731n ==>lRegLoad
n1262n return [ 0x3ff6c21fa30] bci=[-1,10,31] rc=0 vc=1659 vn=- li=11 udi=- nc=0
n1263n BBEnd </block_11> ===== [ 0x3ff6c21fa80] bci=[-1,10,31] rc=0 vc=1659 vn=- li=11 udi=- nc=0
x86 and Power provide inlined implementations of
sun_misc_Unsafe_copyMemory
. Investigate the semantics of this method, its current relevance, and implement on Z, AArch64, and AArch32 as appropriate.