Open fjeremic opened 4 years ago
Tagging @0xdaryl @zl-wang @joransiu who might know the answer to the first question, and @gacholio who may be able to answer the second question.
Putting the offset in a register before dispatch is only required for interface dispatch, since the offset is not obviously available anywhere else. Also, we didn't pass the necessary info for vm helper to re-lookup the offset either.
For the more complicated index method on p and x, it is exactly to retrieve the offset from the instruction sequence for a virtual call where the offset is obviously available (after resolution and patch). I am wondering why the offset is not made available in the virtual dispatch sequence on z. Maybe it cannot be made available at a fixed (relative) location according to the particular CPU type.
P has fixed-sized instructions, so it's easy to look back and determine virtual vs interface. I assume it's not so easy on Z, so the index is always in the register.
P has fixed-sized instructions, so it's easy to look back and determine virtual vs interface. I assume it's not so easy on Z, so the index is always in the register.
It seems that we already emit a NOP to disambiguate between virtual and interface dispatch: https://github.com/eclipse/openj9/blob/386b29b971be293ac255c9c7a004ad24eeb1fd9c/runtime/compiler/z/codegen/S390PrivateLinkage.cpp#L2137-L2140
I wonder if this is some historical reason, because we seem to load the VFT offset for both virtual and interface calls always. The discussion seems to suggest the offset is only needed for interface dispatch, so we could get away by not loading the offset for the virtual dispatch and implementing some logic in jitVTableIndex
to fetch it out of the instruction stream.
How exactly does j2iVirtual
work on Power then? From reading the code it seems to want the VFT offset even for virtual calls to be in gr12
:
https://github.com/eclipse/openj9/blob/386b29b971be293ac255c9c7a004ad24eeb1fd9c/runtime/codert_vm/pnathelp.m4#L561-L577
Who loads this value if we don't do it in the JIT mainline?
See https://github.com/eclipse/openj9/blob/386b29b971be293ac255c9c7a004ad24eeb1fd9c/runtime/oti/JITInterface.hpp#L148-L223 for the instruction decode logic.
The comment in the nathelp code is somewhat misleading - it's only there for the non-interface case (on platforms other than Z).
The other issue is that Z can't encode negative immediate offsets, so I'm not sure you're going to be able to come up with a more compact call sequence in the virtual case, since the negative offset will need to be in a register anyway.
The other issue is that Z can't encode negative immediate offsets, so I'm not sure you're going to be able to come up with a more compact call sequence in the virtual case, since the negative offset will need to be in a register anyway.
That's only true for 12-bit displacements. Extended 20-bit displacements are signed.
so that, i suspected it is the side effect of CPU-type specific reason. is the 20bit displacement available on the lowest denominator you supported now? if yes, you may change the situation. a further issue is to deal with possible overflow of 20-bit displacemnt (very unlikely though).
The comment in the nathelp code is somewhat misleading - it's only there for the non-interface case (on platforms other than Z).
Sorry I'm not sure I follow. As I understand VM_JITInterface::jitVTableIndex
only gets called once inside of old_slow_icallVMprJavaSendPatchupVirtual
. That function seems to fetch the J2I thunk and store it into the VFT slot which we called, presumably so we never call old_slow_icallVMprJavaSendPatchupVirtual
again if we are calling the same virtual method of that class.
I did not see the thunk call VM_JITInterface::jitVTableIndex
, so who exactly loads gr12
in the Power case? Does the thunk perhaps branch to j2iTransition
instead:
https://github.com/eclipse/openj9/blob/386b29b971be293ac255c9c7a004ad24eeb1fd9c/runtime/codert_vm/pnathelp.m4#L545-L559
By "misleading" do you mean that j2iVirtual
in pnathelp.m4 is only branched to for interface calls (NOT virtual calls)? And virtual calls end up dispatching to j2iTransition
?
so that, i suspected it is the side effect of CPU-type specific reason. is the 20bit displacement available on the lowest denominator you supported now? if yes, you may change the situation. a further issue is to deal with possible overflow of 20-bit displacemnt (very unlikely though).
Yeah, the 20-bit displacement instructions are available on our minimum ALS of z9. It must have been the historic reason why this was never changed. We should be able to use 20-bit displacement instructions which support negative displacements to implement this, and avoid having to load the VFT offset for virtual calls in the mainline.
By "misleading" do you mean that
j2iVirtual
in pnathelp.m4 is only branched to for interface calls (NOT virtual calls)? And virtual calls end up dispatching toj2iTransition
?
No, I mean the register is only loaded for the interface case. Both interface and virtual end up at j2iVirtual, and the interpreter calls jitVtableIndex
in order to decode the method from the index.
The VM has no stake in this - if you can come up with a better instruction sequence that can be disambiguated, go for it.
No, I mean the register is only loaded for the interface case. Both interface and virtual end up at j2iVirtual, and the interpreter calls
jitVtableIndex
in order to decode the method from the index.
So then on Power for virtual calls we are storing a bogus value (whatever is in gr12
at the time) to J9VMThread.tempSlot
[1] and the interpreter then only for virtual calls will ignore whatever is in gr12
and will call jitVtableIndex
to find the true VFT offset?
Right, the decode code defaults to reading the register, then looks back in the instruction stream to see if it's actually an encoded virtual call (which currently does not occur on Z) and if so, decodes the index from the call.
Thanks, and just to confirm, j2iVirtual
in pnathelp.m4 will branch to the bytecode loop here:
https://github.com/eclipse/openj9/blob/bf841bbb5c6209cab4f459f00ef35916f20f41b9/runtime/vm/BytecodeInterpreter.hpp#L9142-L9145
Which makes a call to j2iVirtualMethod
:
https://github.com/eclipse/openj9/blob/bf841bbb5c6209cab4f459f00ef35916f20f41b9/runtime/vm/BytecodeInterpreter.hpp#L694-L699
Which then calls jitVtableIndex
to find the VFT index, and that for Z is trivial, and for other platforms it decodes the instruction stream to look for the offset (if it's a virtual call)?
Describing this explicitly as I'm going to ask someone to work on getting rid of the load on Z.
Correct.
While reviewing #8154 I wanted to convince myself of my understanding of virtual dispatches on Z. I went through a simple virtual call and documented the call flow. I'm copying the comment here for further discussion. Two questions arise at the end:
Sorry I'm still not following. What is this
LA
instruction you are referring to? I just debugged this with an example to convince myself of how it works. Here is the call flow:GPR0
J9Class*
) via theBASR
instruction in the JIT mainline codeicallVMprJavaSendPatchupVirtual
function [1]. This function stores the value ofGPR0
intoJ9VMThread.tempSlot
and then makes a call toold_slow_icallVMprJavaSendPatchupVirtual
[2]old_slow_icallVMprJavaSendPatchupVirtual
function takes theJ9VMThread.tempSlot
which is the VFT offset and the return address and callsjitVTableIndex
which should return the VFT index. Notice how on z/Architecture this function just returns the second parameter [3]? We do not decode the instruction stream on Z, so it appears. This is not true on other platforms.old_slow_icallVMprJavaSendPatchupVirtual
which then takes the VFT index and looks up the thunk based on the signature of the method we want to call. These thunks are one per signature and dispatch the calls from JIT to interpreter. The rest of this function call looks up the thunk for the particular call we want to make and then it overwrites the VFT entry in theJ9Class*
with the address of the thunk. This is so we don't have to pay the cost of looking up the thunk the next time we have to dispatch this virtual call. The last thing this method does is store the thunk value toJ9VMThread.tempSlot
[4].icallVMprJavaSendPatchupVirtual
which then dispatches a call to the thunk via an indirect branch off ofJ9VMThread.tempSlot
[5]j2iVirtual
[6]. This ends up branching to the interpreter loop to execute the method.As far as I can see the only reason why we are loading the VFT offset in a register is because looking at the implementation of
j2iVirtual
[6] the VM seems to want this VFT offset to be inJ9VMThread.tempSlot
[7]. It seems because of the possibility of having to dispatch via an interface. I don't quite understand the details of how this works, but I definitely would like to.For me two questions arise:
jitVTableIndex
more complicated on non-Z platforms? What is this API supposed to do exactly?J9VMThread.tempSlot
? For resolved calls we should be able to decode this value from the JIT instruction stream. Given that JIT to interpreter calls are supposed to be rare, I'd like to know if there is an opportunity to remove the VFT load prior to virtual calls on all platforms as that can save quite a bit of path length.[1] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/codert_vm/znathelp.m4#L621-L633 [2] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/codert_vm/cnathelp.cpp#L2733-L2752 [3] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/oti/JITInterface.hpp#L217-L219 [4] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/codert_vm/cnathelp.cpp#L2751 [5] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/codert_vm/znathelp.m4#L632 [6] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/codert_vm/znathelp.m4#L604-L619 [7] https://github.com/eclipse/openj9/blob/dd96e6747e1a51ca757d501a7c6bf110f108f192/runtime/codert_vm/znathelp.m4#L615-L617