dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.55k stars 4.54k forks source link

Clarify calling conventions for profiler Enter callback #10727

Open noahfalk opened 5 years ago

noahfalk commented 5 years ago

@dotnet/jit-contrib @sywhang

While investigating dotnet/runtime#10706 I'm seeing a number of things that look inconsistent and probably need to be fixed or better documented. Jit folks, can you let me know what you think?

1) The FunctionEnter3/FunctionLeave3/FunctionTailcall3 methods are a publicly exposed and have a documented ABI. On Linux x64 we pass FunctionIDOrClientID in R14, MSDN documentation doesn't mention a custom calling convention so developers would expect RDI. I believe we picked R14 for good reason so I propose we change MSDN to match. 2) The runtime sometimes provides the implementation of the ProfileEnter call as an intermediary between the jitted code and other forms of the profiler callback. On Linux x64 that gives us 4 non-agreeing definitions of the register preservation requirements:

I don't have a good sense of exactly what the JIT expects to be preserved across this call for the code to run correctly, but whatever it is I'd like to bring our own comments, implementation, and MSDN docs into alignment with it. I suspect there may be discrepancies for the register preservation requirements on other architectures, but I'm happy to start with Linux x64.

Thanks! -Noah

category:documentation theme:prolog-epilog skill-level:intermediate cost:medium impact:small

BruceForstall commented 5 years ago

Note that the document that JIT depends most on for ABI related questions is the "CLR ABI". It has a section on the profiler hooks: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/clr-abi.md#profiler-hooks. It could certainly be expanded to be more clear, and answer more questions like you have here.

In the JIT, the most interesting parts of the implementation are genProfilingEnterCallback and genProfilingLeaveCallback.

Generally, documentation probably was originally written for x86 -- the first architecture -- and not updated very much to handle the other architectures (x64, Linux x64, arm32, arm64, Linux x86).

It looks to me that for Linux x64:

  1. for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument? Or maybe that's what FunctionEnter3WithInfo (and friends) are, and the JIT just always generates the same code.

There is no documentation in the code or "CLR ABI" to explain why R14/R15 were picked. Presumably it is because there is no caller-provided "home" space for the argument registers, as on Windows x64. So we don't want to trash the incoming registers. On Windows, we first home all the register argument, and then we can trash them.

Regarding register preservation:

The asmhelper.S comment that says rax/rdx/xmm0/xmm1 need to be preserved should, I believe, only apply to the "leave" helper, which needs to preserve the function return value.

These statements should really be backed up by testing! And extended to other platforms.

noahfalk commented 5 years ago

Thanks for looking into this Bruce! I agree on the testing. My thinking here is we could write a trivial profiler that registers ELT callbacks in order to deliberately trash every register we believe we can. If we can have this profiler loaded and pass all the CoreCLR tests then it would be good evidence the analysis was accurate.

for the enter hook we pass R14 = ProfilerMethodHnd (I guess this is FunctionIDOrClientID?), R15 = caller's SP. (For Windows x64, it's the normal first 2 argument registers, RCX/RDX). It looks like we don't document the 2nd argument?

That is intentional. The public contract is only on the 1st argument. The second argument is private contract between JIT and runtime so that the runtime can implement FunctionEnter3WithInfo.

noahfalk commented 5 years ago

@BruceForstall - I've been looking at this a bit more and it raised a few (hopefully quick) additional questions: 1) Are there any scenarios where the JIT needs the upper 64 bits of the XMM arguments preserved? As far as I know the largest floating point type that could be passed as an argument is 8 bytes, and the profiler is only designed to expose 8 byte arguments. I am guessing save/restore on the low 8 bytes is sufficient. 2) All the callbacks currently preserve 16 bytes for XMM0/XMM1 return values. I wasn't planning to change this for Leave/Tailcall functions, but if you knew I was curious if we use larger return values?

BruceForstall commented 5 years ago

The questions are specific to x64, I believe.

We don't support __vectorcall convention, so:

  1. only the low 64 bits of XMM arguments need be preserved.
  2. I believe we also only support 64-bit return values in XMM0. For Linux/x64, it's a little more complicated: XMM0 and XMM1 can return two members of a struct of two doubles. I can't recall what happens for a struct of 2 floats in this case.

Maybe @CarolEidt can comment to verify.

CarolEidt commented 5 years ago

@BruceForstall is right about the handling of the upper bits of XMM arguments, though for anything that's not classified as a call, we expect them to be preserved.

On Linux/x64, I believe it's the case that a struct of 2 floats would be returned in XMM0, but a struct of 2 doubles or 3 or 4 floats would be returned in XMM0 and XMM1.

There's no support for using more than 2 registers for returns.

BruceForstall commented 5 years ago

@noahfalk It doesn't seem like this is a 3.0 issue, so I'm moving it to Future.