Open noahfalk opened 5 years ago
Note that the document that JIT depends most on for ABI related questions is the "CLR ABI". It has a section on the profiler hooks: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#profiler-hooks. It could certainly be expanded to be more clear, and answer more questions like you have here.
In the JIT, the most interesting parts of the implementation are genProfilingEnterCallback
and genProfilingLeaveCallback
.
Generally, documentation probably was originally written for x86 -- the first architecture -- and not updated very much to handle the other architectures (x64, Linux x64, arm32, arm64, Linux x86).
It looks to me that for Linux x64:
FunctionEnter3WithInfo
(and friends) are, and the JIT just always generates the same code.There is no documentation in the code or "CLR ABI" to explain why R14/R15 were picked. Presumably it is because there is no caller-provided "home" space for the argument registers, as on Windows x64. So we don't want to trash the incoming registers. On Windows, we first home all the register argument, and then we can trash them.
Regarding register preservation:
The asmhelper.S comment that says rax/rdx/xmm0/xmm1 need to be preserved should, I believe, only apply to the "leave" helper, which needs to preserve the function return value.
These statements should really be backed up by testing! And extended to other platforms.
Thanks for looking into this Bruce! I agree on the testing. My thinking here is we could write a trivial profiler that registers ELT callbacks in order to deliberately trash every register we believe we can. If we can have this profiler loaded and pass all the CoreCLR tests then it would be good evidence the analysis was accurate.
for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument?
That is intentional. The public contract is only on the 1st argument. The second argument is private contract between JIT and runtime so that the runtime can implement FunctionEnter3WithInfo.
@BruceForstall - I've been looking at this a bit more and it raised a few (hopefully quick) additional questions: 1) Are there any scenarios where the JIT needs the upper 64 bits of the XMM arguments preserved? As far as I know the largest floating point type that could be passed as an argument is 8 bytes, and the profiler is only designed to expose 8 byte arguments. I am guessing save/restore on the low 8 bytes is sufficient. 2) All the callbacks currently preserve 16 bytes for XMM0/XMM1 return values. I wasn't planning to change this for Leave/Tailcall functions, but if you knew I was curious if we use larger return values?
The questions are specific to x64, I believe.
We don't support __vectorcall
convention, so:
Maybe @CarolEidt can comment to verify.
@BruceForstall is right about the handling of the upper bits of XMM arguments, though for anything that's not classified as a call, we expect them to be preserved.
On Linux/x64, I believe it's the case that a struct of 2 floats would be returned in XMM0, but a struct of 2 doubles or 3 or 4 floats would be returned in XMM0 and XMM1.
There's no support for using more than 2 registers for returns.
@noahfalk It doesn't seem like this is a 3.0 issue, so I'm moving it to Future.
@dotnet/jit-contrib @sywhang
While investigating dotnet/runtime#10706 I'm seeing a number of things that look inconsistent and probably need to be fixed or better documented. Jit folks, can you let me know what you think?
1) The FunctionEnter3/FunctionLeave3/FunctionTailcall3 methods are a publicly exposed and have a documented ABI. On Linux x64 we pass FunctionIDOrClientID in R14, MSDN documentation doesn't mention a custom calling convention so developers would expect RDI. I believe we picked R14 for good reason so I propose we change MSDN to match. 2) The runtime sometimes provides the implementation of the ProfileEnter call as an intermediary between the jitted code and other forms of the profiler callback. On Linux x64 that gives us 4 non-agreeing definitions of the register preservation requirements:
I don't have a good sense of exactly what the JIT expects to be preserved across this call for the code to run correctly, but whatever it is I'd like to bring our own comments, implementation, and MSDN docs into alignment with it. I suspect there may be discrepancies for the register preservation requirements on other architectures, but I'm happy to start with Linux x64.
Thanks! -Noah
category:documentation theme:prolog-epilog skill-level:intermediate cost:medium impact:small