Open hopperpl opened 2 years ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | hopperpl |
---|---|
Assignees: | - |
Labels: | `tenet-performance`, `area-CodeGen-coreclr` |
Milestone: | - |
Should be pretty straightforward to fix, marking as help-wanted
Noting this can only be done on 64-bit. On 32-bit the required REX.W bit to force a 64-bit input isn't available.
Beware, that code is not the same. The multiplication factor has to be 255.0 / 31.0 = 8.2258
and not truncated down to 8
.
You have to change the code to return color * (255 / (float) 31);
and make sure there is no temporary double
math. The term in parenthesis avoids an expensive floating division.
@hopperpl @EgorBo
I see ryujit overview for learning about the JIT and workflow as a general getting started guide, and viewing jit dumps seems relevant as a "JIT-specific" workflow.
Any other docs you think are worth mentioning for a first timer?
(not taking this issue yet) edit: missed than issue was self-assigned
@TIHan are you working on this? If not, want to give it to a first timer? (I'll probably need a little help though)
@YoniFeng sorry for the delayed response, I'm on a vacation.
I'd start from https://github.com/dotnet/runtime/blob/main/docs/workflow/debugging/coreclr/debugging-runtime.md
Build repo, build VS solution for runtime, then configure it to run for a simple hello world app and try to set a breakpoint in jit in some random place, e.g. fgOptimizeCast
function
Thanks, I got a working setup. It isn't immediately obvious to me how I can navigate/breakpoint for exactly the method I'm interested in (even though my example program is just a Main and the example method), but I only played around for a few minutes.
Now I'm trying to figure out what we even want to do here (I'm realize its super obvious to you). Sharing my learning/thoughts here "live", so I can be corrected on anything misunderstood (helping you help me). For TL;DR and order's sake, questions are enumerated and in bold.
The IL for the example code given is:
IL_0000: ldarg.0
IL_0001: conv.r.un
IL_0002: conv.r4
IL_0003: ldc.r4 8.225806
IL_0008: mul
IL_0009: ret
The short main description for conv.r.un and conv.r4 make it seem like there's wasteful double conversion here, but on reading the longer description it's necessary because conv.r.un
might return a float64
. (Does the short main description for conv.r.un
require correction?)
Looking at the JITDump, we get the uint -> long -> double -> float
chain right off the bat:
Importing BB01 (PC=000) of 'ConsoleApplication.Program:GetComp5(uint):float'
[ 0] 0 (0x000) ldarg.0
[ 1] 1 (0x001) conv.r.un
[ 1] 2 (0x002) conv.r4
[ 1] 3 (0x003) ldc.r4 8.2258062362670898
[ 2] 8 (0x008) mul
[ 1] 9 (0x009) ret
STMT00000 ( 0x000[E-] ... ??? )
[000005] ----------- * RETURN float
[000004] ----------- \--* MUL float
[000002] ----------- +--* CAST float <- double
[000001] ---------U- | \--* CAST double <- uint
[000000] ----------- | \--* LCL_VAR int V00 arg0
[000003] ----------- \--* CNS_DBL float 8.2258062362670898
This seems a little odd to me.
While it's true there's no direct uint -> float32
x86 instruction (I had to google that, I'm not even vaguely familiar with x86/SSE/AVX etc), I find it a little odd that the conversion isn't represented as such here, and only at a later phase (lowering? maybe that's too late) is it converted into a concrete chain of instructions that get the job done.
Naively, it feels like "implementation details" don't matter at this phase, so why the granularity?
1. Is it OK/expected that HIR is like this?
Tree remains as-is until the rationalization, so no "real" changes:
------------ BB01 [000..00A) (return), preds={} succs={}
[000007] ----------- IL_OFFSET void INLRT @ 0x000[E-]
N001 ( 3, 2) [000000] ----------- t0 = LCL_VAR int V00 arg0
/--* t0 int
N002 ( 4, 4) [000006] ---------U- t6 = * CAST long <- uint
/--* t6 long
N003 ( 10, 10) [000001] ----------- t1 = * CAST double <- long
/--* t1 double
N004 ( 16, 16) [000002] ----------- t2 = * CAST float <- double
N005 ( 3, 2) [000003] ----------- t3 = CNS_DBL float 8.2258062362670898
/--* t2 float
+--* t3 float
N006 ( 24, 22) [000004] ----------- t4 = * MUL float
/--* t4 float
N007 ( 25, 23) [000005] ----------- * RETURN float
-------------------------------------------------------------------------------------------------------------------
Although earlier I wondered if the representation isn't "wrong"/"wasteful" and if we couldn't take care of things at lowering - from reading the JIT overview it seems like by the lowering phase it's more like things are almost "set in stone", i.e. we're going to emit machine code based on the LIR. So maybe it really is the HIR we need to be changing, at one of the earlier stages (morphing?)
2. Does the IR need to have been changed to uint -> long -> f32
(if applicable) before lowering is reached?
@tannergooding your comment seems generally applicable, but assuming what's needed here is a morph (removing the redundant long -> double
), I'm missing if/how it's relevant.
In CodeGen::genIntToFloatCast
, I see the following check:
#if !defined(TARGET_64BIT)
// We expect morph to replace long to float/double casts with helper calls
noway_assert(!varTypeIsLong(srcType));
#endif // !defined(TARGET_64BIT)
Therefore, a morph optimization should be safe, right? (If I add the morph before the helper calls morph the leftover long -> float
will get replaced. If I add it after, we're fine because if the long -> double -> float
pattern is found it implicitly means we're not on 32 bit)
@EgorBo @TIHan would love to continue working on this, need some guidance :)
(@TIHan I'll re-iterate that you can tell me to drop working on this, given you already self-assigned it)
@TIHan, @YoniFeng Is this still open and is anyone working on this right now? I'd be happy to work on it if a first-time Open Source contributor might be helpful.
converted to
there shouldn't be two conversions, integer directly to single-precision is possible
for reference, clang does this (in C++):