Move arithmetic DivideByZero and Overflow exceptions to JIT on 32-bit platforms

am11 commented 1 day ago

From @jkotas https://github.com/dotnet/runtime/pull/109087#issuecomment-2436388291

The extra call frame introduced by the changes in this PR is where the overhead is.

main: managed method -> FCall (with argument checks that require HMF) -> internal C-runtime div helper

PR (notice the extra frame): managed method -> managed helper call (argument checks) -> FCall (without argument checks) -> internal C-runtime div helper

Inlined checks: managed method (with argument checks) -> FCall (without argument checks) -> internal C-runtime div helper

On 32 bit platforms such as x86 and arm32, JIT uses software fallback CORINFO_HELP_{,U,L,UL}{DIV,MOD} to handle DivideByZero and Overflow exceptions in div/mod arithmetic ops via FCThrow in jithelpers.cpp. FCThrow uses HELPER_METHOD_FRAME which we are trying to remove from runtime (https://github.com/dotnet/runtime/issues/95695).

JIT can handle inserting the software fallback using the existing CORINFO_HELP_OVERFLOW and CORINFO_HELP_THROWDIVZERO helpers on 32 bit platforms. This will remove 8 (out of 9) remaining FCThrow calls from jithelpers.cpp.

dotnet-policy-service[bot] commented 1 day ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

jkotas commented 1 day ago

@EgorBo Any chance you can take a look at this?

This is contributing to our larger runtime simplification and unifications efforts.

EgorBo commented 1 day ago

@EgorBo Any chance you can take a look at this?

This is contributing to our larger runtime simplification and unifications efforts.

Can we just introduce managed helpers? (with qcalls into native helpers for actual division, or maybe even fully managed).

MichalPetryka commented 1 day ago

@EgorBo Any chance you can take a look at this? This is contributing to our larger runtime simplification and unifications efforts.

Can we just introduce managed helpers? (with qcalls into native helpers for actual division, or maybe even fully managed).

That's what #109087 tried.

am11 commented 1 day ago

@EgorBo I was wondering if we can handle it in JIT itself, like we are checking the conditions here:

https://github.com/dotnet/runtime/blob/afdd68bf7cbd3128e1a86def2d5f9789591a1d2d/src/coreclr/jit/gentree.cpp#L16104-L16110

EgorBo commented 1 day ago

That's what https://github.com/dotnet/runtime/pull/109087 tried.

Ah, short memory

@EgorBo I was wondering if we can handle it in JIT itself, like we are checking the conditions here:

We can definitely do this in JIT, just turns out to be not entirely single-line change as I hoped 🙂

EgorBo commented 6 hours ago

@jkotas @am11 it looks like we need to emit quite a bit of control-flow to re-create what these conditions do https://github.com/am11/runtime/blob/0e61d9d6ca5c312443edf175e7d7b6be72077c00/src/libraries/System.Private.CoreLib/src/System/Math.DivModInt.cs#L55-L78 (4 different helpers + extra helpers on arm32) and there is little we can re-use from existing arm64 impl.

After discussing this with @jakobbotsch, we wonder if we can: 1) Inline managed helpers in JIT (thus, some conditions could be folded or fast path (32bit divison) can be performed without call overhead) 2) Replace FCall with a direct pinvoke (without gc transition) to libc/whatever native helper we end up using?

also, I've kicked off a benchmark on 32bit windows to see performance impact of am11 impl: https://github.com/dotnet/runtime/pull/109087#issuecomment-2506387343

jkotas commented 6 hours ago

Inline managed helpers in JIT (thus, some conditions could be folded or fast path (32bit divison) can be performed without call overhead)

Do you mean to build a general-purpose feature to inline JIT helpers that happen to be written in C#?

Replace FCall with a direct pinvoke (without gc transition) to libc/whatever native helper we end up using?

It would require depending on undocumented C runtime details that's problematic. I do not think we need it to avoid regressions. We are paying for the FCall wrapper today. We do not need to be doing extra work to improve x86.

also, I've kicked off a benchmark on 32bit windows to see performance impact of am11 impl

It shows 1.7x regression. It is more than the earlier measurements that showed about 30% regression.

EgorBo commented 5 hours ago

It shows 1.7x regression. It is more than the https://github.com/dotnet/runtime/pull/109087#issuecomment-2436079739 that showed about 30% regression.

It looks like the codegen for the C# version is not perfect (and e.g. Is32bitSigned is not inlined). Not sure we want to invest into peepholes for 32bit, though.

Do you mean to build a general-purpose feature to inline JIT helpers that happen to be written in C#?

Yep, I think we wanted that elsewhere too

Overall, I don't have a strong opinion on which path to choose, but looks like all of them involve quite a bit of work: 1) Emit jumps/conditions in jit emitter - unlike arm64, these involve a bit more code, e.g. here is codegen for x86 in C++: https://godbolt.org/z/zEzGKjfEr The downside of this approach is that we need to repeat the logic for arm32 as well + not sure, perphas RISC-V/LA64 use these helpers too? 2) Emit conditions in some crossplatform manner in JIT, e.g. similar to static-ctor-expansion (involves even more efforts actually) 3) Optimize managed implementations + 32bit-specific peepholes.

Probably, the 1st one indeed is the simplest (if we also avoid smart optimizations with "is it actually 32bit value")

am11 commented 5 hours ago

perphas RISC-V/LA64 use these helpers too?

Div/Mod helpers in jithelpers.cpp are only called on 32-bit platforms. RV64 and LA64 seem to have a little bit different morphing: https://github.com/dotnet/runtime/blob/c6bebf08c6e94a88a6bf76e80bfc361daa295be4/src/coreclr/jit/morph.cpp#L8582

dotnet / runtime

Move arithmetic DivideByZero and Overflow exceptions to JIT on 32-bit platforms #110226