Open am11 opened 1 day ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
@EgorBo Any chance you can take a look at this?
This is contributing to our larger runtime simplification and unifications efforts.
@EgorBo Any chance you can take a look at this?
This is contributing to our larger runtime simplification and unifications efforts.
Can we just introduce managed helpers? (with qcalls into native helpers for actual division, or maybe even fully managed).
@EgorBo Any chance you can take a look at this? This is contributing to our larger runtime simplification and unifications efforts.
Can we just introduce managed helpers? (with qcalls into native helpers for actual division, or maybe even fully managed).
That's what #109087 tried.
@EgorBo I was wondering if we can handle it in JIT itself, like we are checking the conditions here:
That's what https://github.com/dotnet/runtime/pull/109087 tried.
Ah, short memory
@EgorBo I was wondering if we can handle it in JIT itself, like we are checking the conditions here:
We can definitely do this in JIT, just turns out to be not entirely single-line change as I hoped 🙂
@jkotas @am11 it looks like we need to emit quite a bit of control-flow to re-create what these conditions do https://github.com/am11/runtime/blob/0e61d9d6ca5c312443edf175e7d7b6be72077c00/src/libraries/System.Private.CoreLib/src/System/Math.DivModInt.cs#L55-L78 (4 different helpers + extra helpers on arm32) and there is little we can re-use from existing arm64 impl.
After discussing this with @jakobbotsch, we wonder if we can: 1) Inline managed helpers in JIT (thus, some conditions could be folded or fast path (32bit divison) can be performed without call overhead) 2) Replace FCall with a direct pinvoke (without gc transition) to libc/whatever native helper we end up using?
also, I've kicked off a benchmark on 32bit windows to see performance impact of am11 impl: https://github.com/dotnet/runtime/pull/109087#issuecomment-2506387343
Inline managed helpers in JIT (thus, some conditions could be folded or fast path (32bit divison) can be performed without call overhead)
Do you mean to build a general-purpose feature to inline JIT helpers that happen to be written in C#?
Replace FCall with a direct pinvoke (without gc transition) to libc/whatever native helper we end up using?
It would require depending on undocumented C runtime details that's problematic. I do not think we need it to avoid regressions. We are paying for the FCall wrapper today. We do not need to be doing extra work to improve x86.
also, I've kicked off a benchmark on 32bit windows to see performance impact of am11 impl
It shows 1.7x regression. It is more than the earlier measurements that showed about 30% regression.
It shows 1.7x regression. It is more than the https://github.com/dotnet/runtime/pull/109087#issuecomment-2436079739 that showed about 30% regression.
It looks like the codegen for the C# version is not perfect (and e.g. Is32bitSigned is not inlined). Not sure we want to invest into peepholes for 32bit, though.
Do you mean to build a general-purpose feature to inline JIT helpers that happen to be written in C#?
Yep, I think we wanted that elsewhere too
Overall, I don't have a strong opinion on which path to choose, but looks like all of them involve quite a bit of work: 1) Emit jumps/conditions in jit emitter - unlike arm64, these involve a bit more code, e.g. here is codegen for x86 in C++: https://godbolt.org/z/zEzGKjfEr The downside of this approach is that we need to repeat the logic for arm32 as well + not sure, perphas RISC-V/LA64 use these helpers too? 2) Emit conditions in some crossplatform manner in JIT, e.g. similar to static-ctor-expansion (involves even more efforts actually) 3) Optimize managed implementations + 32bit-specific peepholes.
Probably, the 1st one indeed is the simplest (if we also avoid smart optimizations with "is it actually 32bit value")
perphas RISC-V/LA64 use these helpers too?
Div/Mod helpers in jithelpers.cpp are only called on 32-bit platforms. RV64 and LA64 seem to have a little bit different morphing: https://github.com/dotnet/runtime/blob/c6bebf08c6e94a88a6bf76e80bfc361daa295be4/src/coreclr/jit/morph.cpp#L8582
From @jkotas https://github.com/dotnet/runtime/pull/109087#issuecomment-2436388291
On 32 bit platforms such as x86 and arm32, JIT uses software fallback
CORINFO_HELP_{,U,L,UL}{DIV,MOD}
to handleDivideByZero
andOverflow
exceptions in div/mod arithmetic ops viaFCThrow
in jithelpers.cpp.FCThrow
usesHELPER_METHOD_FRAME
which we are trying to remove from runtime (https://github.com/dotnet/runtime/issues/95695).JIT can handle inserting the software fallback using the existing
CORINFO_HELP_OVERFLOW
andCORINFO_HELP_THROWDIVZERO
helpers on 32 bit platforms. This will remove 8 (out of 9) remainingFCThrow
calls from jithelpers.cpp.