eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

Implement inlined version of sun_misc_Unsafe_copyMemory #7136

Open 0xdaryl opened 5 years ago

0xdaryl commented 5 years ago

x86 and Power provide inlined implementations of sun_misc_Unsafe_copyMemory. Investigate the semantics of this method, its current relevance, and implement on Z, AArch64, and AArch32 as appropriate.

0xdaryl commented 5 years ago

FYI @fjeremic

fjeremic commented 5 years ago

I question why this belongs to the codegen in the first place? This seems like something that should be reduced to a TR::arraycopy at the common optimizer level, rather than it being replicated across every codegen. Do you have any thoughts about this?

Looking into this deeper on x86 [1] and Power [2] it looks like we have duplicate code here in the codegen evaluators. It seems however there already is a common optimization which transforms these calls into TR:arraycopy in VP [3].

Rather than reimplementing this on every codegen we should examine whether the current x86 and Power implementations in the codegen are needed or if the VP pass is enough to capture all the use cases. If it is not, we should likely focus on improving the VP pass, or alternatively moving this logic to J9RecognizedCallTransformer since it's a simple method to tree transformation that runs at IL gen time.

We can then deprecate the x86 and Power codegen implementations and benefit from this transformation on all codegens for free.

[1] https://github.com/eclipse/openj9/blob/10ba6a66b2709e35d0b7b90ed9108db14ccb3ab8/runtime/compiler/x/codegen/J9TreeEvaluator.cpp#L9859-L9899

[2] https://github.com/eclipse/openj9/blob/10ba6a66b2709e35d0b7b90ed9108db14ccb3ab8/runtime/compiler/p/codegen/J9TreeEvaluator.cpp#L13193

[3] https://github.com/eclipse/omr/blob/ffb0d9866e44551d22553be84adada6e546f0a19/compiler/optimizer/ValuePropagationCommon.cpp#L957-L1030

0xdaryl commented 5 years ago

Of course, if the semantics of the primitive copy are identical then a cross-platform approach is always welcome.

simonameng commented 4 years ago

@andrewcraik We compared the VP code and the cases in X86 and Power. I believe VP code has already covered what X86 and Power are doing. And we are now thinking of:

  1. Migrate the potion of the VP code that reduces a call to sun_misc_Unsafe_copyMemory to a TR::arraycopy node to J9RecognizedCallTransformer
  2. Implement a VP pass over TR::arraycopy nodes which looks at the children and their constraints and tries to set a forward/backward flag if enough information is available
  3. Deprecate the current code related to sun_misc_Unsafe_copyMemory in VP Do you think the above actions are doable?
simonameng commented 4 years ago

@andrewcraik Could you please help take a look at the actions mentioned above and see if they are doable?

andrewcraik commented 4 years ago

So doing the transform in the recognized call transformer is likely doable. A general opt in VP to set the forward/backward flag is much more dangerous - arraycopy nodes come from a lot of places so while we should be able to set the flags in VP based on constraints there is a real risk of bugs being exposed and having to back off because of specialized use of the arraycopy (main ones in this category of fear in my mind are the ones from IdiomRecognition).

Care will need to be taken to check the details of the different codegens in how their arraycopy works and the flags they check and the impact they have. While this should be consistent there is a real risk of inconsistency leading to potential bugs.

Simply implementing the above and passing the tests is not going to be enough analysis will need to be done to try to ensure the transformations won't leave bugs lurking since this is a tricky area.

simonameng commented 4 years ago

@andrewcraik We tried to add an assertion fail in https://github.com/eclipse/omr/blob/ffb0d9866e44551d22553be84adada6e546f0a19/compiler/optimizer/ValuePropagationCommon.cpp#L1024-L1025 And then we launched a personal build on all platforms. The result is that no job failed. We think it will have no impact on performance if we delete the constrain part of the VP code and migrate the recognition part to J9RecognizedCallTransformer, as we ran a huge amount of coverage testing and not once did we hit the constraint path in VP.

andrewcraik commented 4 years ago

I don't think the fact it isn't seen in a personal build is a reason to drop the support of setting the direction flag. If the constraint handling for arraycopy can handle this then we should be good. If not that enhancement is needed.

fjeremic commented 4 years ago

I don't think the fact it isn't seen in a personal build is a reason to drop the support of setting the direction flag.

The code in VP would be dead if we migrate the transformation to J9RecognizedCallTransformer as VP will never see a call to a recognized method TR::sun_misc_Unsafe_copyMemory during VP. As outlined previously, the risk of doing the constraint analysis on all TR::arraycopy nodes is too risky, so we can't do that either. How do you suggest we proceed?

IMO if the constraint piece of code setting the arraycopy flag is not hit in all of our testing, how can we even be sure the constraint piece of code in VP works in the first place?

ehsankianifar commented 1 year ago

I believe this pr satisfies the purpose of this issue. In the mentioned pr, the transformation was moved from arch specific codeGen to runtime/compiler/codegen/J9CodeGenerator.cpp. I tested calling Unsafe.CopyMemory on s390x with disableLocalVP and disableGlobalVP options and collected jit logs: in Post Optimization Trees:

<LowerTrees>
parent 000003FF6C1516F0 appears to be a nullcheck over node: 000003FF6C1516A0
parent 000003FF6C155A70 appears to be a nullcheck over node: 000003FF6C1559D0
parent 000003FF6C1570A0 appears to be a nullcheck over node: 000003FF6C157050
parent 000003FF6C21B430 appears to be a nullcheck over node: 000003FF6C21B390
[   734] O^O Call arraycopy instead of Unsafe.copyMemory:      0x3ff6c14edb0
</LowerTrees>

in Post Lower Trees:

n394n     BBStart <block_11> (freq 4502)                                                      [     0x3ff6c14eae0] bci=[-1,7,31] rc=0 vc=1659 vn=- li=11 udi=- nc=1
n1726n      GlRegDeps ()                                                                      [     0x3ff6c548b40] bci=[-1,7,31] rc=1 vc=1659 vn=- li=11 udi=- nc=5 flg=0x20
n1727n        aRegLoad GPR3   <parm 2 [I>[#399  Parm] [flags 0x40000107 0x0 ] (SeenRealReference )  [     0x3ff6c548b90] bci=[1,22,1368] rc=2 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1728n        aRegLoad GPR2   <parm 1 [I>[#398  Parm] [flags 0x40000107 0x0 ] (SeenRealReference )  [     0x3ff6c548be0] bci=[1,20,1368] rc=2 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1729n        lRegLoad GPR6   <parm 3 J>[#400  Parm] [flags 0xc0000104 0x0 ] (SeenRealReference )  [     0x3ff6c548c30] bci=[1,21,1368] rc=3 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1730n        aRegLoad GPR7   <temp slot 10>[#562  Auto] [flags 0x20004007 0x0 ] (X!=0 SeenRealReference )  [     0x3ff6c548c80] bci=[1,19,1368] rc=1 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8004
n1731n        lRegLoad GPR8   <parm 5 J>[#401  Parm] [flags 0xc0000104 0x0 ] (SeenRealReference )  [     0x3ff6c548cd0] bci=[1,26,1368] rc=2 vc=1659 vn=- li=11 udi=- nc=0 flg=0x8000
n1937n    arraycopy  <arraycopy>[#245  helper Method] [flags 0x400 0x0 ] ()                   [     0x3ff6c54cd30] bci=[1,20,1368] rc=0 vc=0 vn=- li=- udi=- nc=3 flg=0x20
n1935n      aladd                                                                             [     0x3ff6c54cc90] bci=[1,20,1368] rc=1 vc=0 vn=- li=- udi=- nc=2
n1728n        ==>aRegLoad
n1729n        ==>lRegLoad
n1936n      aladd                                                                             [     0x3ff6c54cce0] bci=[1,22,1368] rc=1 vc=0 vn=- li=- udi=- nc=2
n1727n        ==>aRegLoad
n1729n        ==>lRegLoad
n1731n      ==>lRegLoad
n1262n    return                                                                              [     0x3ff6c21fa30] bci=[-1,10,31] rc=0 vc=1659 vn=- li=11 udi=- nc=0
n1263n    BBEnd </block_11> =====                                                             [     0x3ff6c21fa80] bci=[-1,10,31] rc=0 vc=1659 vn=- li=11 udi=- nc=0