dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

Get rid of unoptimal moves in a popular ML pattern. #13541

Open sandreenko opened 5 years ago

sandreenko commented 5 years ago

ML.Net code has several places where we do a = a * const_int;, for example, MurmurHash has 6 imul instructions in the final asm for x64 https://github.com/dotnet/machinelearning/blob/b861b5d64841cbe0f2c866ee7586872aac450a51/src/Microsoft.ML.Core/Utilities/Hashing.cs#L118 and in some cases, we do them with one extra mov:

IN0054: 000127 mov      eax, r15d
IN0055: 00012A imul     eax, eax, 0xFFFFFFFFCC9E2D51 (2 instructions, 9 bytes)

instead of

IN0055: 00012A imul     eax, r15d, 0xFFFFFFFFCC9E2D51 (1 instruction, 7 bytes)

the non-optimal codegen happens when we inline MurmurRound and create a temp LCL_VAR for hash argument: hash = MurmurRound(hash, (uint)len);

IR looks like:

***** BB02
STMT00068 (IL   ???...  ???)
[000363] -A------R---              *  ASG       long  
[000361] D------N----              +--*  LCL_VAR   long   V04 loc1         d:3
[000362] ------------              \--*  PHI       long  
[000385] ------------ pred BB04       +--*  PHI_ARG   long   V04 loc1         u:5
[000364] ------------ pred BB08       \--*  PHI_ARG   long   V04 loc1         u:2 $100

***** BB03
STMT00030 (IL 0x010...  ???)
[000125] -A------R---              *  ASG       int    $285
[000124] D------N----              +--*  LCL_VAR   int    V08 tmp1         d:2 $285
[000030] C-----------              \--*  LCL_VAR   int    V04 loc1         u:3 $285

***** BB03
STMT00020 (IL 0x010...  ???)
[000082] -A------R---              *  ASG       int    $286
[000081] D------N----              +--*  LCL_VAR   int    V08 tmp1         d:3 $286
[000079] ------------              \--*  MUL       int    $286
[000077] ------------                 +--*  LCL_VAR   int    V08 tmp1         u:2 (last use) $285
[000078] ------------                 \--*  CNS_INT   int    0xffffffffcc9e2d51 $49

and we want to get rid of STMT00030.

I have thought about 3 possible places where it could be done:

  1. copy propagation https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L131;
  2. lowering set contained for mul https://github.com/dotnet/coreclr/blob/0a00ee7fdfd113c8c2d47c85ed210de78cab4bdd/src/jit/lowerxarch.cpp#L1644;
  3. where we create a lclVar for the argument, Compiler::fgInlinePrependStatements, https://github.com/dotnet/coreclr/blob/master/src/jit/flowgraph.cpp#L23243.

I have tried all of them and did not get a good result,

3: fgInlinePrependStatements already can replace an argument that was single used with the original tree, I was able to teach it to replace an argument that was originally a lcl_var with this lcl_var loads (instead of creating a new one), but only if the argument was not modified inside the inline method (not our case). That means it supports cases like

inline myMethod(lclVar0);
where myMethod(arg)
{
  multiple uses of arg, but no defs.
}

we can support defs if we now that inline myMethod(lclVar0); is the last use of lclVar0 (our case, because we have hash = call(hash)), but it happens before we generate live information, so we don't know that call(lclVar0) is the last use of lclVar0.

2: ContainCheckMul set contained on IsContainableMemoryOp, so it doesn't support moves from one register to another, forcing it to set contained on [000077] gave me many asserts.

1: Compiler::optCopyProp looks like the best candidate to handle this extra move, but currently, it doesn't work because: 1.1 [000361] D------N---- +--* LCL_VAR long V04 loc1 d:3 doesn't have a VN pair, because it is a phi statement that is processed here: https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/valuenum.cpp#L5885-L5890 and there we don't set VNPair for the tree, so copyProp ends on: https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L203-L207

if we change that and assign a VNPair for [000361] then we would consider it as a candidate for [000081] D------N---- +--* LCL_VAR int V08 tmp1 d:3 $286 replacement, but it would be declined, because 000361 is long and 000081 is int, so copyProp ends on: https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L208-L211 If we fix that we will still have different VN values so the copy propagation won't happen: https://github.com/dotnet/coreclr/blob/c8ad76dd8169238c085ee6e3f03d074aed4b76b2/src/jit/copyprop.cpp#L212-L215

but if somehow we skip these checks (manually in a debugger for example) and do the propagation, then we have asm that we want without any asserts in later stages.

Note: the moves are cheap but there are many of them so I expect it to give us at least measurable code size improvement.

category:cq theme:basic-cq skill-level:intermediate cost:medium impact:medium

sandreenko commented 5 years ago

I would like to continue working on this case, could you give me some advice on how to move forward, @AndyAyersMS @CarolEidt @dotnet/jit-contrib?

mikedn commented 5 years ago

Compiler::optCopyProp looks like the best candidate to handle this extra move, but currently, it doesn't work because:

It does, though this might be a special case that I haven't noticed before - the source of the copy is a PHI.

but it would be declined, because 000361 is long and 000081 is int, so copyProp ends on:

Ah, that's weird. I suppose the variable type is actually long and the JIT retypes the LCL_VAR node to avoid inserting a cast. A bad idea IMO. Granted, adding back the cast won't actually fix the problem since this is no longer a copy. Though then the cast could perhaps be made contained before codegen (since such a narrowing cast is basically a no-op).

mikedn commented 5 years ago

Though I suppose you may as well modify if (op->TypeGet() != tree->TypeGet()) to recognize this particular pattern of long->int reinterpretation. When in Rome...

AndyAyersMS commented 4 years ago

@sandreenko I take it you are no longer actively working on this? Seems like we should move it to future.

sandreenko commented 4 years ago

I agree with that.