dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

JIT: Model `Int64/UInt64 -> Single` casts without intermediate cast to `Double` #106646

Open amanasifkhalid opened 1 month ago

amanasifkhalid commented 1 month ago

Follow-up to #106419 (discussion). When casting, for example, a ulong to a float, Roslyn emits the following IL:

conv.r.un
conv.r4

Which is then imported by the JIT as a ulong -> float -> double cast sequence. Casting in two steps can produce slightly different results from casting directly to float, which is also why it isn't always correct to just morph the above IR into ulong -> float. Ideally, we would differentiate between ulong -> float and ulong -> double -> float during importation, and create IR that accurately models each pattern.

Similarly, constant folding should model each pattern correctly. In other words, to model ulong -> float, we should cast the constant directly to float, and rely on MSVC/Clang/GCC/etc. to emit the correct sequence.

cc @dotnet/jit-contrib

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

jakobbotsch commented 1 month ago

Ideally, we would differentiate between ulong -> float and ulong -> double -> float during importation, and create IR that accurately models each pattern.

In my opinion, as I mentioned in the discussion, the ideal fix is that Roslyn produces IL that implements ulong -> float conversions correctly. If we try to fix this in the JIT we end up with a bunch of odd possible behavior around how IL is handled.

amanasifkhalid commented 1 month ago

In my opinion, as I mentioned in the discussion, the ideal fix is that Roslyn produces IL that implements ulong -> float conversions correctly. If we try to fix this in the JIT we end up with a bunch of odd possible behavior around how IL is handled.

I agree that that would simplify the JIT side. I'm guessing this will have to be a .NET 10 item if we want a fix in Roslyn, right? Should we open an issue in their repo?

tannergooding commented 1 month ago

the ideal fix is that Roslyn produces IL that implements ulong -> float conversions correctly

I don't think any scenario where Roslyn emits new IL patterns would be viable and will likely cause large performance regressions.

Exposing a new intrinsic API (RuntimeHelpers.ConvertToDouble(ulong) and RuntimeHelpers.ConvertToSingle(ulong) for example) would be viable and ensure the JIT can still recognize the intent and optimize, but that still causes issues for existing code/libraries not matching the intent the user authored.

Since we are already in a world where we have oddities here based on what IL is present, whether optimizations are enabled or disabled, and what the underlying hardware supports instruction set wise. I don't think normalizing that to ensure that conv.r.un; conv.r4 does the "right" thing is going to hurt that. I rather think that it will fix a bug for the vast majority of code out there (certainly anything currently being produced by Roslyn or F#) and only leave potential ambiguities for what is effectively manually written IL (which can be left as UB, which is functionally how it is already today). We could normalize such manually written IL as well, if that were desired (such as handling conv.r.un; ...; conv.r4 with a gap in the IL, but no extra stores/uses could be handled for example as we could recognize the tree shape CONV_R4(CONV_R_UN) rather than only recognizing the explicit sequence).

jakobbotsch commented 1 month ago

cc @jaredpar @dsyme @jkotas

The gist of the problem is that currently, Roslyn and fsc do not implement ulong -> float and ulong -> double conversions accurately. The compilers are emitting conv.r.un; conv.r4, which is not equivalent when the internal representation used for type F is larger than float/double. So there is a question of what the best way to fix this is; whether we should recognize conv.r.un; conv.r4/conv.r8 sequences specially and redefine their behavior, whether we should try providing intrinsics that the compilers can use, or maybe something entirely different.

tannergooding commented 1 month ago

The compilers are emitting conv.r.un; conv.r4, which is not equivalent when the internal representation used for type F is larger than float/double

@jakobbotsch, this notably isn't about the internal representation being larger than float/double, that's never the case for RyuJIT for Arm64 or x64 (and shouldn't be the case for x86 since we use SSE/SSE2 and directly configure the x87 FPU for 64-bit rounding mode).

This is simply that ulong -> double -> float produces different from ulong -> float for some inputs (such as 16105307124325679103) due to double rounding. There exists no singular IL instruction for doing ulong -> float (nor technically one for ulong -> double), hence the need for either recognizing conv.r.un; conv.r4/conv.r8 as meaning ulong -> float/double rather than ulong -> F -> float/double or some new intrinssic API (such as RuntimeHelpers.ConvertUInt64ToSingle/Double) or some other suggested alternative.

jakobbotsch commented 1 month ago

@jakobbotsch, this notably isn't about the internal representation being larger than float/double, that's never the case for RyuJIT for Arm64 or x64 (and shouldn't be the case for x86 since we use SSE/SSE2 and directly configure the x87 FPU for 64-bit rounding mode).

This is simply that ulong -> double -> float produces different from ulong -> float for some inputs (such as 16105307124325679103) due to double rounding. There exists no singular IL instruction for doing ulong -> float (nor technically one for ulong -> double), hence the need for either recognizing conv.r.un; conv.r4/conv.r8 as meaning ulong -> float/double rather than ulong -> F -> float/double or some new intrinssic API (such as RuntimeHelpers.ConvertUInt64ToSingle/Double) or some other suggested alternative.

I meant that conv.r.un; conv.r8 is also not guaranteed to be equivalent to a ulong -> double conversion since the target IL compiler may use a larger representation than double for type F. I know that this is not the case for RyuJIT, so conv.r.un; conv.r8 will be a proper ulong -> double conversion if RyuJIT happens to be compiling the IL.

jaredpar commented 1 month ago

whether we should recognize conv.r.un; conv.r4/conv.r8 sequences specially and redefine their behavior, whether we should try providing intrinsics that the compilers can use, or maybe something entirely different.

In general I prefer having well defined intrinsics the compiler can use vs. trying to infer meaning from a series of op codes that somewhat deviates from the literal meaning of the op codes.

dsyme commented 1 month ago

cc @vzarytovskii @0101 @T-Gro

vzarytovskii commented 1 month ago

cc @vzarytovskii @0101 @T-Gro

Thanks. I'll see what do we emit after I'm back.

Update I see Jakob already looked at our codegen, thanks for that.

vzarytovskii commented 1 month ago

@jakobbotsch out of curiosity, what's the F# code you were looking at? If its something along the lines of let convert (a: uint64) : float = float a, then it should technically be straightforward for us to change, if we decide its the right thing to do.

T-Gro commented 1 month ago

@jakobbotsch out of curiosity, what's the F# code you were looking at? If its something along the lines of let convert (a: uint64) : float = float a, then it should technically be straightforward for us to change, if we decide its the right thing to do.

The instruction sequence is what gets emitted for the FSharp.Core's float32 function. The same pattern is used for more input types:

             when ^T : uint64    = (# "conv.r.un conv.r4" value  : float32 #)
             when ^T : uint32    = (# "conv.r.un conv.r4" value  : float32 #)
             when ^T : uint16    = (# "conv.r.un conv.r4" value  : float32 #)
             when ^T : char      = (# "conv.r.un conv.r4" value  : float32 #)
             when ^T : unativeint = (# "conv.r.un conv.r4" value  : float32 #)
             when ^T : byte     = (# "conv.r.un conv.r4" value  : float32 #)

https://github.com/dotnet/fsharp/blob/bd9e200b5c0d177cc4a4155b59b8f52fbd4487c5/src/FSharp.Core/prim-types.fs#L5057-L5062