Open sivarv opened 7 years ago
Why would you limit signed compare to just <
and >=
? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG. Only multi-bit shifts do not update OF but as seen elsewhere shifts have their own problems and solutions.
Relop Peep-hole optimization.
I played a bit with this and it looks like there aren't many opportunities in jitdiff fx:
Total bytes of diff: -399 (0.00 % of base)
diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
Base had 0 unique methods, 0 unique bytes
Diff had 0 unique methods, 0 unique bytes
Top file improvements by size (bytes):
-189 : System.Private.CoreLib.dasm (-0.01 % of base)
-123 : System.Text.RegularExpressions.dasm (-0.13 % of base)
-29 : Microsoft.CSharp.dasm (-0.01 % of base)
-22 : System.Collections.Immutable.dasm (-0.01 % of base)
-13 : Microsoft.CodeAnalysis.CSharp.dasm (0.00 % of base)
11 total files with size differences (11 improved, 0 regressed).
Top method improvements by size (bytes):
-30 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
-23 : System.Private.CoreLib.dasm - List`1:FindLastIndex(int,int,ref):int:this (23 methods)
-22 : System.Text.RegularExpressions.dasm - RegexParser:ScanCharClass(bool,bool):ref:this
-20 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(ref,int,int,int) (18 methods)
-18 : System.Private.CoreLib.dasm - Array:LastIndexOf(ref,struct,int,int):int (9 methods)
77 total methods with size differences (77 improved, 0 regressed).
But it's pretty cheap to implement so maybe one day...
Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG.
And because they require the overflow flag the optimization would be incorrect due to integer overflow.
Which unfortunately means that this is not so useful and more complicated to implement. The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.
Besides, like all pattern matching the JIT does, this fails to work when the ADD/SUB operations are "hidden" behind a local var. So cases like
for (int i = x; i >= 0; i--) { ... }
where this optimization would be more useful won't be handled anyway.
The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.
I have a change that implements this in https://github.com/mikedn/coreclr/commits/cmp-flags-2
There are some 70 hits in corelib and a few others in the rest of the framework. As such I'm not sure if it's worth it.
cc @sdmaclea
@mikedn Are you referring to this change https://github.com/mikedn/coreclr/commit/049992a0c1ae49462ae11e9e68de80f8a18ecc19 ?
Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code. Seems it would keep the issue restricted to platform codegen.
Are you referring to this change mikedn/coreclr@049992a ?
That change happens to add support for S and NS condition codes (MI an PL on ARM). The actual optimization is in the subsequent commit.
Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code.
The comparison is removed, perhaps you're referring to the JCC/SETCC nodes? We could certainly slap another GTF flag on it to indicate that we want S/NS instead of L/GE but personally I don't like using flags for this stuff, it feels like a hack. Anyway, the change in 49992a is something I happen to have around from other experiments (such as if conversion) so I just used that.
Might be worth looking at this again; was looking if the test
could be dropped from dec
so inc
/dec
could fuse with the jmp as https://github.com/dotnet/coreclr/pull/27149 was introducing this sequence:
dec r13d
test r13d, r13d
jl SHORT G_M59263_IG15
On that PR the following change https://github.com/dotnet/coreclr/compare/master...benaadams:peephole-opt-incdec?expand=1 produced
Summary:
(Lower is better)
Total bytes of diff: -700 (-0.02% of base)
diff is an improvement.
Top file improvements by size (bytes):
-700 : System.Private.CoreLib.dasm (-0.02% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements by size (bytes):
-27 (-1.90% of base) : System.Private.CoreLib.dasm - DecCalc:ScaleResult(long,int,int):int
-25 (-1.02% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref):bool:this (5 methods)
-24 (-1.34% of base) : System.Private.CoreLib.dasm - ArraySortHelper`1:Heapsort(ref,int,int,ref) (8 methods)
-20 (-0.88% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(ref,byref):bool:this (5 methods)
-18 (-1.15% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref,byref):bool:this (3 methods)
-18 (-1.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(long,byref):bool:this (3 methods)
-15 (-1.16% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long):bool:this (3 methods)
-15 (-1.07% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long,byref):bool:this (3 methods)
-13 (-0.27% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(struct,struct,byref,struct):bool (5 methods)
-12 (-0.54% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (3 methods)
-12 (-0.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (2 methods)
-12 (-3.96% of base) : System.Private.CoreLib.dasm - List`1:FindLast(ref):struct:this (2 methods)
-11 (-1.33% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:Heapsort(ref,int,int) (5 methods)
-10 (-1.50% of base) : System.Private.CoreLib.dasm - MulticastDelegate:RemoveImpl(ref):ref:this
-10 (-0.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(struct):struct (4 methods)
-9 (-1.20% of base) : System.Private.CoreLib.dasm - Enum:InternalFlagsFormat(ref,ref,long):ref
-9 (-0.42% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(int,struct,byref,struct):bool (2 methods)
-8 (-0.71% of base) : System.Private.CoreLib.dasm - Number:FormatFixed(byref,byref,int,ref,ref,ref)
-8 (-0.98% of base) : System.Private.CoreLib.dasm - Number:TryNumberToDecimal(byref,byref):bool
-8 (-0.39% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(long,struct,byref,struct):bool (2 methods)
-8 (-1.60% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatUInt64MoreThanBillionMaxUInt(long,struct,byref):bool
-7 (-1.34% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatInt64LessThanNegativeBillionMaxUInt(long,struct,byref):bool
-6 (-5.88% of base) : System.Private.CoreLib.dasm - DecCalc:DivByConst(long,int,byref,byref,int):int
-6 (-5.26% of base) : System.Private.CoreLib.dasm - Number:UInt32ToDecChars(long,int,int):long (2 methods)
-6 (-0.46% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(byte,struct,byref,struct):bool
149 total methods with size differences (149 improved, 0 regressed), 18499 unchanged.
The impact would be larger since it would effecting a generic (i.e. Dictionary) as well as the new Utf8Formatter:TryFormat methods.
Don't think my peephole change is correct due to the difference with the overflow flag; and @mikedn's change looks far more comprehensive,
Although maybe changing the test in the C# will trigger the != 0
optimization that already exists?
Although maybe changing the test in the C# will trigger the
!= 0
optimization that already exists?
No doesn't seem to work 😢
PR https://github.com/dotnet/coreclr/pull/7943 fixed this issue for == and != against zero. Still the following cases can be optimized when op1 is known to set ZF and SF flags.
category:cq theme:basic-cq skill-level:intermediate cost:medium