Open amanasifkhalid opened 1 month ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Ideally, we would differentiate between
ulong -> float
andulong -> double -> float
during importation, and create IR that accurately models each pattern.
In my opinion, as I mentioned in the discussion, the ideal fix is that Roslyn produces IL that implements ulong -> float
conversions correctly. If we try to fix this in the JIT we end up with a bunch of odd possible behavior around how IL is handled.
In my opinion, as I mentioned in the discussion, the ideal fix is that Roslyn produces IL that implements ulong -> float conversions correctly. If we try to fix this in the JIT we end up with a bunch of odd possible behavior around how IL is handled.
I agree that that would simplify the JIT side. I'm guessing this will have to be a .NET 10 item if we want a fix in Roslyn, right? Should we open an issue in their repo?
the ideal fix is that Roslyn produces IL that implements ulong -> float conversions correctly
I don't think any scenario where Roslyn emits new IL patterns would be viable and will likely cause large performance regressions.
Exposing a new intrinsic API (RuntimeHelpers.ConvertToDouble(ulong)
and RuntimeHelpers.ConvertToSingle(ulong)
for example) would be viable and ensure the JIT can still recognize the intent and optimize, but that still causes issues for existing code/libraries not matching the intent the user authored.
Since we are already in a world where we have oddities here based on what IL is present, whether optimizations are enabled or disabled, and what the underlying hardware supports instruction set wise. I don't think normalizing that to ensure that conv.r.un; conv.r4
does the "right" thing is going to hurt that. I rather think that it will fix a bug for the vast majority of code out there (certainly anything currently being produced by Roslyn or F#) and only leave potential ambiguities for what is effectively manually written IL (which can be left as UB, which is functionally how it is already today). We could normalize such manually written IL as well, if that were desired (such as handling conv.r.un; ...; conv.r4
with a gap in the IL, but no extra stores/uses could be handled for example as we could recognize the tree shape CONV_R4(CONV_R_UN)
rather than only recognizing the explicit sequence).
cc @jaredpar @dsyme @jkotas
The gist of the problem is that currently, Roslyn and fsc do not implement ulong -> float
and ulong -> double
conversions accurately. The compilers are emitting conv.r.un; conv.r4
, which is not equivalent when the internal representation used for type F
is larger than float/double
. So there is a question of what the best way to fix this is; whether we should recognize conv.r.un; conv.r4/conv.r8
sequences specially and redefine their behavior, whether we should try providing intrinsics that the compilers can use, or maybe something entirely different.
The compilers are emitting conv.r.un; conv.r4, which is not equivalent when the internal representation used for type F is larger than float/double
@jakobbotsch, this notably isn't about the internal representation being larger than float/double, that's never the case for RyuJIT for Arm64 or x64 (and shouldn't be the case for x86 since we use SSE/SSE2 and directly configure the x87 FPU for 64-bit rounding mode).
This is simply that ulong -> double -> float
produces different from ulong -> float
for some inputs (such as 16105307124325679103
) due to double rounding. There exists no singular IL instruction for doing ulong -> float
(nor technically one for ulong -> double
), hence the need for either recognizing conv.r.un; conv.r4/conv.r8
as meaning ulong -> float/double
rather than ulong -> F -> float/double
or some new intrinssic API (such as RuntimeHelpers.ConvertUInt64ToSingle/Double
) or some other suggested alternative.
@jakobbotsch, this notably isn't about the internal representation being larger than float/double, that's never the case for RyuJIT for Arm64 or x64 (and shouldn't be the case for x86 since we use SSE/SSE2 and directly configure the x87 FPU for 64-bit rounding mode).
This is simply that
ulong -> double -> float
produces different fromulong -> float
for some inputs (such as16105307124325679103
) due to double rounding. There exists no singular IL instruction for doingulong -> float
(nor technically one forulong -> double
), hence the need for either recognizingconv.r.un; conv.r4/conv.r8
as meaningulong -> float/double
rather thanulong -> F -> float/double
or some new intrinssic API (such asRuntimeHelpers.ConvertUInt64ToSingle/Double
) or some other suggested alternative.
I meant that conv.r.un; conv.r8
is also not guaranteed to be equivalent to a ulong -> double
conversion since the target IL compiler may use a larger representation than double
for type F
. I know that this is not the case for RyuJIT, so conv.r.un; conv.r8
will be a proper ulong -> double
conversion if RyuJIT happens to be compiling the IL.
whether we should recognize conv.r.un; conv.r4/conv.r8 sequences specially and redefine their behavior, whether we should try providing intrinsics that the compilers can use, or maybe something entirely different.
In general I prefer having well defined intrinsics the compiler can use vs. trying to infer meaning from a series of op codes that somewhat deviates from the literal meaning of the op codes.
cc @vzarytovskii @0101 @T-Gro
cc @vzarytovskii @0101 @T-Gro
Thanks. I'll see what do we emit after I'm back.
Update I see Jakob already looked at our codegen, thanks for that.
@jakobbotsch out of curiosity, what's the F# code you were looking at? If its something along the lines of let convert (a: uint64) : float = float a
, then it should technically be straightforward for us to change, if we decide its the right thing to do.
@jakobbotsch out of curiosity, what's the F# code you were looking at? If its something along the lines of
let convert (a: uint64) : float = float a
, then it should technically be straightforward for us to change, if we decide its the right thing to do.
The instruction sequence is what gets emitted for the FSharp.Core's float32
function.
The same pattern is used for more input types:
when ^T : uint64 = (# "conv.r.un conv.r4" value : float32 #)
when ^T : uint32 = (# "conv.r.un conv.r4" value : float32 #)
when ^T : uint16 = (# "conv.r.un conv.r4" value : float32 #)
when ^T : char = (# "conv.r.un conv.r4" value : float32 #)
when ^T : unativeint = (# "conv.r.un conv.r4" value : float32 #)
when ^T : byte = (# "conv.r.un conv.r4" value : float32 #)
Follow-up to #106419 (discussion). When casting, for example, a
ulong
to afloat
, Roslyn emits the following IL:Which is then imported by the JIT as a
ulong -> float -> double
cast sequence. Casting in two steps can produce slightly different results from casting directly tofloat
, which is also why it isn't always correct to just morph the above IR intoulong -> float
. Ideally, we would differentiate betweenulong -> float
andulong -> double -> float
during importation, and create IR that accurately models each pattern.Similarly, constant folding should model each pattern correctly. In other words, to model
ulong -> float
, we should cast the constant directly tofloat
, and rely on MSVC/Clang/GCC/etc. to emit the correct sequence.cc @dotnet/jit-contrib