IComparer<T> constraint leads to worse codegen than IComparisonOperators<T,T,bool> constraint

stephentoub commented 1 year ago

We introduced the new IComparisonOperators<,,> interface as part of the generic math APIs work in .NET 7, but for general-purpose APIs that need to compare and aren't constrained to numerical types, it's preferable from a design perspective to use the longer-standing IComparable<T>. Unfortunately, the codegen that's produced with IComparable<T> can end up being worse than that for IComparisonOperators<,,>, forcing a hard choice. Can we improve the JIT here?

Example: SharpLab

#nullable disable
using System;
using System.Numerics;
using SharpLab.Runtime;

public class C
{
    [JitGeneric(typeof(int))]
    public static bool M1<T>(T value, T other) where T : IComparable<T>, IComparisonOperators<T, T, bool> =>
        value.CompareTo(other) > 0;

    [JitGeneric(typeof(int))]
    public static bool M2<T>(T value, T other) where T : IComparable<T>, IComparisonOperators<T, T, bool> =>
        value > other;

    public static bool M3(int value, int other) =>
        value.CompareTo(other) > 0;

    public static bool M4(int value, int other) =>
        value > other;
}

Both M1 (generic) and M3 (non-generic) that use CompareTo end up producing:

    L0000: cmp ecx, edx
    L0002: jl short L0013
    L0004: cmp ecx, edx
    L0006: jg short L001a
    L0008: xor eax, eax
    L000a: test eax, eax
    L000c: setg al
    L000f: movzx eax, al
    L0012: ret
    L0013: mov eax, 0xffffffff
    L0018: jmp short L000a
    L001a: mov eax, 1
    L001f: jmp short L000a

whereas both M2 (generic) and M4 (non-generic) that use IComparisonOperators<T,T,bool> end up producing:

    L0000: xor eax, eax
    L0002: cmp ecx, edx
    L0004: setg al
    L0007: ret

Related to https://github.com/dotnet/runtime/pull/78222 cc: @egorbo, @tannergooding

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

We introduced the new `IComparisonOperators<,,>` interface as part of the generic math APIs work in .NET 7, but for general-purpose APIs that need to compare and aren't constrained to numerical types, it's preferable from a design perspective to use the longer-standing `IComparable`. Unfortunately, the codegen that's produced with `IComparable` can end up being worse than that for `IComparisonOperators<,,>`, forcing a hard choice. Can we improve the JIT here? Example: [SharpLab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8BiAOwFcAbS7YSmAAgBMBLXW+gWACgABAJgCMPXgAYGvQQDoAcuXwwoLMLgDcI8QGUAFtigAHADK0pAJXKkMLBeu4iAzBP4MAwjwDePBt4YBtAFIsGADiMKSKygAUGACe+jAQAGaRLJYAlGkAul4+vI6SSAzAEBCUDACyggA8ACoAfJE1DABu2JTkMGgMTRAY2oppDADu/bDdDCAMAJIuEPj6ehwwtXVdM3MLSrgQpADy8VDYGNC4tV01XcWldQwAvHU5Pj6t7TBSs/N6MDUQkb2jgxuolsT0eEnygkKVzK5X4KUsLTaHS6qQwDH+AzuD24TyeLw6DBuGKgIJ8YLyEkhRRKMPs8LR+M6DFR6L6mPuYLxSLeH0231+xMBDGBYLBASCoXCSjA0TiCWSqIy2RxuQhUJpFRQK0aiNe51ZAOGo0YTUm60+hzoy3qa15ejYO32iiOJzO3UuNJuHJVuN1BKJbJJPAAvkA=) ```C# #nullable disable using System; using System.Numerics; using SharpLab.Runtime; public class C { [JitGeneric(typeof(int))] public static bool M1(T value, T other) where T : IComparable, IComparisonOperators => value.CompareTo(other) > 0; [JitGeneric(typeof(int))] public static bool M2(T value, T other) where T : IComparable, IComparisonOperators => value > other; public static bool M3(int value, int other) => value.CompareTo(other) > 0; public static bool M4(int value, int other) => value > other; } ``` Both M1 (generic) and M3 (non-generic) that use `CompareTo` end up producing: ```asm L0000: cmp ecx, edx L0002: jl short L0013 L0004: cmp ecx, edx L0006: jg short L001a L0008: xor eax, eax L000a: test eax, eax L000c: setg al L000f: movzx eax, al L0012: ret L0013: mov eax, 0xffffffff L0018: jmp short L000a L001a: mov eax, 1 L001f: jmp short L000a ``` whereas both M2 (generic) and M4 (non-generic) that use `IComparisonOperators` end up producing: ```asm L0000: xor eax, eax L0002: cmp ecx, edx L0004: setg al L0007: ret ``` Related to https://github.com/dotnet/runtime/pull/78222 cc: @egorbo, @tannergooding

Author:	stephentoub
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch commented 1 year ago

Looks like the exact case of #70616. I'm not sure what @EgorBo's conclusion ended up being around this (there is the classic ? true : false workaround in the C# source that we reverted to in #70631)

EgorBo commented 1 year ago

Likely the same as https://github.com/dotnet/runtime/issues/78383, assigning Andy who was looking at it in https://github.com/dotnet/runtime/pull/83859

AndyAyersMS commented 1 year ago

This still feels somewhat beyond our grasp.For M3, during RBO, we have

image (31)

And we want to recognize that the predicate value returned in BB06 is also a function of the two predicate operands in BB01, so that the entire graph collapses to a single compare.

RBO could handle it something like the following:

At BB06, realize that we have a return value that depends on PHI inputs
At BB06, realize that each PHI input resolves the return value to a constant
- path (a) T1 = -1 ==> rv false
- path (b) T1 = 0 ==> rv false
- path (c) T1 = 1 ==> rv true
Discover that there is a control-equivalent branch (dominating BB06 and post-dominated by BB06) in BB01
Determine that BB01's operands can be compared in way that establishes the return value.
- we can do this by computing the conditions under which RV is true (path c):
- AND(GE(V00,V01), NOT(LE(V00,V01)) ==> GT(V00,V01)
Determine that there are no important side effects in between BB01's compare and BB06's return
Create a return at end of BB01 replacing the branch, with comparison above, and delete the other blocks.

The prototype changes for #81220 have a parts of this logic, but that analysis won't get triggered by returns, and does not have a sufficiently powerful side-effect analysis (currently tripped up by the assignments to T1). We should be able to argue that the only uses of T1 are in this subgraph (or perhaps, the only uses of the defs of T1 in this subgraph are also in this subgraph) and the only side effects are assignments to T1, so these assignment side effects can be disregarded as we will be removing all the uses too.

AndyAyersMS commented 1 year ago

This is going to take more work so it's not going to happen in .NET 8.

There is a decent draft PR with some of the necessary bits here: https://github.com/dotnet/runtime/pull/88527. It doesn't actually find that many cases and introduces a fair amount of new logic, so I'm going to hold off merging this until after .NET 8 is done.

AndyAyersMS commented 2 months ago

Still work needed on the approaches taken above -- would be nice if the overall opt was less pattern matchy somehow.

For the return cases perhaps postdominance can play a role? Starting at a return (relop), if we walk up the postdominator tree and find a postdominated BBJ_COND, we can do the same sort of symbolic analysis to see if the decision made there determines (or could be modified to determine) the value that will be returned, and if so, perhaps we can short-circuit all the logic in between.

For the more general case we may need something like collective postdominance... say two blocks collectively postdominate some nest of computation, we can do similar analysis with the collective postdominator tree, to see if some decision made earlier can be modified to bypass all the nest and arrive at the right partial postdominator.

The other missing bit is the ability to recognize when the within-nest side effects' effects are contained to the nest, so if the nest goes away nothing of importance is lost.

I'm going to move this out of .NET 9 as it needs more time and thought.

dotnet / runtime

IComparer<T> constraint leads to worse codegen than IComparisonOperators<T,T,bool> constraint #78383