RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero)

sivarv commented 7 years ago

PR https://github.com/dotnet/coreclr/pull/7943 fixed this issue for == and != against zero. Still the following cases can be optimized when op1 is known to set ZF and SF flags.

                    // TODO-CQ: We can optimize compare against zero in the
                    // following cases by generating the branch as indicated
                    // against each case.
                    //  1) unsigned compare
                    //        < 0  - always FALSE
                    //       <= 0  - ZF=1 and jne
                    //        > 0  - ZF=0 and je
                    //       >= 0  - always TRUE
                    //
                    // 2) signed compare
                    //        < 0  - SF=1 and js
                    //       >= 0  - SF=0 and jns

category:cq theme:basic-cq skill-level:intermediate cost:medium

mikedn commented 7 years ago

Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG. Only multi-bit shifts do not update OF but as seen elsewhere shifts have their own problems and solutions.

sivarv commented 7 years ago

Relop Peep-hole optimization.

mikedn commented 7 years ago

I played a bit with this and it looks like there aren't many opportunities in jitdiff fx:

Total bytes of diff: -399 (0.00 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
        -189 : System.Private.CoreLib.dasm (-0.01 % of base)
        -123 : System.Text.RegularExpressions.dasm (-0.13 % of base)
         -29 : Microsoft.CSharp.dasm (-0.01 % of base)
         -22 : System.Collections.Immutable.dasm (-0.01 % of base)
         -13 : Microsoft.CodeAnalysis.CSharp.dasm (0.00 % of base)
11 total files with size differences (11 improved, 0 regressed).
Top method improvements by size (bytes):
         -30 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
         -23 : System.Private.CoreLib.dasm - List`1:FindLastIndex(int,int,ref):int:this (23 methods)
         -22 : System.Text.RegularExpressions.dasm - RegexParser:ScanCharClass(bool,bool):ref:this
         -20 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(ref,int,int,int) (18 methods)
         -18 : System.Private.CoreLib.dasm - Array:LastIndexOf(ref,struct,int,int):int (9 methods)
77 total methods with size differences (77 improved, 0 regressed).

But it's pretty cheap to implement so maybe one day...

mikedn commented 7 years ago

Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG.

And because they require the overflow flag the optimization would be incorrect due to integer overflow.

Which unfortunately means that this is not so useful and more complicated to implement. The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.

Besides, like all pattern matching the JIT does, this fails to work when the ADD/SUB operations are "hidden" behind a local var. So cases like

for (int i = x; i >= 0; i--) { ... }

where this optimization would be more useful won't be handled anyway.

mikedn commented 7 years ago

The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.

I have a change that implements this in https://github.com/mikedn/coreclr/commits/cmp-flags-2

There are some 70 hits in corelib and a few others in the rest of the framework. As such I'm not sure if it's worth it.

cc @sdmaclea

sdmaclea commented 7 years ago

@mikedn Are you referring to this change https://github.com/mikedn/coreclr/commit/049992a0c1ae49462ae11e9e68de80f8a18ecc19 ?

Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code. Seems it would keep the issue restricted to platform codegen.

mikedn commented 7 years ago

Are you referring to this change mikedn/coreclr@049992a ?

That change happens to add support for S and NS condition codes (MI an PL on ARM). The actual optimization is in the subsequent commit.

Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code.

The comparison is removed, perhaps you're referring to the JCC/SETCC nodes? We could certainly slap another GTF flag on it to indicate that we want S/NS instead of L/GE but personally I don't like using flags for this stuff, it feels like a hack. Anyway, the change in 49992a is something I happen to have around from other experiments (such as if conversion) so I just used that.

benaadams commented 4 years ago

Might be worth looking at this again; was looking if the test could be dropped from dec so inc/dec could fuse with the jmp as https://github.com/dotnet/coreclr/pull/27149 was introducing this sequence:

dec      r13d
test     r13d, r13d
jl       SHORT G_M59263_IG15

On that PR the following change https://github.com/dotnet/coreclr/compare/master...benaadams:peephole-opt-incdec?expand=1 produced

Summary:
(Lower is better)

Total bytes of diff: -700 (-0.02% of base)
    diff is an improvement.

Top file improvements by size (bytes):
        -700 : System.Private.CoreLib.dasm (-0.02% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
         -27 (-1.90% of base) : System.Private.CoreLib.dasm - DecCalc:ScaleResult(long,int,int):int
         -25 (-1.02% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref):bool:this (5 methods)
         -24 (-1.34% of base) : System.Private.CoreLib.dasm - ArraySortHelper`1:Heapsort(ref,int,int,ref) (8 methods)
         -20 (-0.88% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(ref,byref):bool:this (5 methods)
         -18 (-1.15% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref,byref):bool:this (3 methods)
         -18 (-1.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(long,byref):bool:this (3 methods)
         -15 (-1.16% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long):bool:this (3 methods)
         -15 (-1.07% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long,byref):bool:this (3 methods)
         -13 (-0.27% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(struct,struct,byref,struct):bool (5 methods)
         -12 (-0.54% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (3 methods)
         -12 (-0.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (2 methods)
         -12 (-3.96% of base) : System.Private.CoreLib.dasm - List`1:FindLast(ref):struct:this (2 methods)
         -11 (-1.33% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:Heapsort(ref,int,int) (5 methods)
         -10 (-1.50% of base) : System.Private.CoreLib.dasm - MulticastDelegate:RemoveImpl(ref):ref:this
         -10 (-0.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(struct):struct (4 methods)
          -9 (-1.20% of base) : System.Private.CoreLib.dasm - Enum:InternalFlagsFormat(ref,ref,long):ref
          -9 (-0.42% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(int,struct,byref,struct):bool (2 methods)
          -8 (-0.71% of base) : System.Private.CoreLib.dasm - Number:FormatFixed(byref,byref,int,ref,ref,ref)
          -8 (-0.98% of base) : System.Private.CoreLib.dasm - Number:TryNumberToDecimal(byref,byref):bool
          -8 (-0.39% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(long,struct,byref,struct):bool (2 methods)
          -8 (-1.60% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatUInt64MoreThanBillionMaxUInt(long,struct,byref):bool
          -7 (-1.34% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatInt64LessThanNegativeBillionMaxUInt(long,struct,byref):bool
          -6 (-5.88% of base) : System.Private.CoreLib.dasm - DecCalc:DivByConst(long,int,byref,byref,int):int
          -6 (-5.26% of base) : System.Private.CoreLib.dasm - Number:UInt32ToDecChars(long,int,int):long (2 methods)
          -6 (-0.46% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(byte,struct,byref,struct):bool

149 total methods with size differences (149 improved, 0 regressed), 18499 unchanged.

The impact would be larger since it would effecting a generic (i.e. Dictionary) as well as the new Utf8Formatter:TryFormat methods.

Don't think my peephole change is correct due to the difference with the overflow flag; and @mikedn's change looks far more comprehensive,

benaadams commented 4 years ago

Although maybe changing the test in the C# will trigger the != 0 optimization that already exists?

benaadams commented 4 years ago

Although maybe changing the test in the C# will trigger the != 0 optimization that already exists?

No doesn't seem to work 😢

dotnet / runtime

RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794