The performance of Comparing and Ordering things

manofstick commented 4 years ago

This story began half a decade ago with an adventurous implementation, and received a more reasonable, albeit more limited, implementation a mere two years ago yet as the tides of time go neither of these is (trivially) merge-able into current master. So rather than continuing to hit myself in the head with a hammer making an updated version, I'd rather put this out to the people who actually pull the strings of Pull Requests in order to determine if this effort will yield any fruit (or just a completely mashed head).

So, why's this a problem anyway? Here's a little use of structural comparison within a dictionary setting with NodaTime's Instant as the key...

module Program

(*
<PackageReference Include="BenchmarkDotNet" Version="0.12.1" />
<PackageReference Include="NodaTime" Version="3.0.0" />
*)

open System
open System.Collections.Generic
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open NodaTime

[<MemoryDiagnoser>]
type ComparisonComparison () =
    let mutable instants : array<Instant> = Unchecked.defaultof<_> // some bunch of instants

    let checkContains (dict:IDictionary<Instant,_>) =
        for i = 0 to instants.Length-1 do
            if i % 2 = 0 <> (dict.ContainsKey instants.[i]) then
                failwith "unexpected"

    let mutable dictOperators                 = Unchecked.defaultof<IDictionary<Instant, obj>>
    let mutable dictEqualityComparerDefault   = Unchecked.defaultof<IDictionary<Instant, obj>>
    let mutable dictHashIdentityStructural    = Unchecked.defaultof<IDictionary<Instant, obj>>
    let mutable dictHashIdentityNonStructural = Unchecked.defaultof<IDictionary<Instant, obj>>

    [<Params (1, 100, 10000)>] 
    member val public DictSize = 0 with get, set

    [<GlobalSetup>]
    member this.SetupData() =
        instants <-
            let rng = Random 42
            let range lower upper = rng.Next (lower, upper+1)
            Array.init (this.DictSize*2) (fun _ ->
                Instant.FromUtc (range 1900 2100, range 1 12, range 1 28, range 0 23, range 0 59, range 0 59))

        let tmpDictHashIdentityStructural    = Dictionary HashIdentity.Structural
        let tmpDictHashIdentityNonStructural = Dictionary HashIdentity.NonStructural
        let tmpDictEqualityComparerDefault   = Dictionary EqualityComparer.Default
        dictOperators <-
            instants
            |> Seq.mapi (fun idx instant -> (idx, instant))
            |> Seq.filter (fun (idx,_) -> idx % 2 = 0)
            |> Seq.map (fun (_,instant) ->
                tmpDictEqualityComparerDefault.[instant] <- null
                tmpDictHashIdentityStructural.[instant] <- null
                tmpDictHashIdentityNonStructural.[instant] <- null
                instant, null)
            |> dict
        dictEqualityComparerDefault   <- tmpDictEqualityComparerDefault
        dictHashIdentityStructural    <- tmpDictHashIdentityStructural   
        dictHashIdentityNonStructural <- tmpDictHashIdentityNonStructural

    [<Benchmark (Baseline=true)>]
    member _.Operators () = checkContains dictOperators

    [<Benchmark>]
    member _.EqualityComparerDefault () = checkContains dictEqualityComparerDefault

    [<Benchmark>]
    member _.HashIdentityStructural () = checkContains dictHashIdentityStructural

    [<Benchmark>]
    member _.HashIdentityNonStructural () = checkContains dictHashIdentityNonStructural

[<EntryPoint>]
let Main args =
    let switch = BenchmarkSwitcher [| typeof<ComparisonComparison>  |]

    switch.Run args |> ignore

    0

Method	DictSize	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
Operators	1	1,412.23 ns	5.008 ns	4.439 ns	1.00	0.0420	-	-	176 B
EqualityComparerDefault	1	37.43 ns	0.271 ns	0.240 ns	0.03	-	-	-	-
HashIdentityStructural	1	1,362.13 ns	8.848 ns	8.276 ns	0.96	0.0305	-	-	128 B
HashIdentityNonStructural	1	45.30 ns	0.159 ns	0.141 ns	0.03	-	-	-	-

Operators	100	139,269.09 ns	477.875 ns	423.623 ns	1.00	4.1504	-	-	17600 B
EqualityComparerDefault	100	4,995.58 ns	2.543 ns	1.985 ns	0.04	-	-	-	-
HashIdentityStructural	100	134,801.94 ns	411.938 ns	343.987 ns	0.97	2.9297	-	-	12800 B
HashIdentityNonStructural	100	5,993.03 ns	4.839 ns	4.290 ns	0.04	-	-	-	-

Operators	10000	14,260,777.88 ns	33,679.950 ns	28,124.284 ns	1.00	406.2500	-	-	1760021 B
EqualityComparerDefault	10000	596,535.22 ns	429.688 ns	401.930 ns	0.04	-	-	-	1 B
HashIdentityStructural	10000	13,830,610.78 ns	67,093.408 ns	62,759.215 ns	0.97	296.8750	-	-	1280232 B
HashIdentityNonStructural	10000	708,332.98 ns	643.601 ns	502.482 ns	0.05	-	-	-	1 B

So I'm not sure if you notice there the slight difference in performance...

?? Huh what ??

That not a slight difference, that's a difference of ~20-25x slower using Structural comparison.

So why is this a problem and what can we do about it (well besides going back in a time machine and slurp in those PRs when they were ready...)

Well first of all we need to know why the decision was made to do this, why it doesn't cause pain with built in types, and then decide what we can do about it.

In original .net land, by default, reference types were equal only if ReferenceEquals is true and value types were only equal if, in a bitwise comparison, they were true (which meant if they had embedded reference types they were only considered equal if those ReferenceEquals were equal), but this didn't provide the type of experience that fsharp wanted to provide. Containers (array, ResizeArray, Dictionary, etc) were just seen as ordinary reference types.

In f-sharp land, by default, records and discriminated unions (of both the reference and value type flavours) implement a series of methods/interfaces so that their compare ever element, and if the elements are all equal, then the object is declared equal. Containers (array, list, map, etc.) also implemented these rules so they too could do comparisons.

To facilitate the f-sharp rules, new interfaces IStructuralComparable and IStructuralEquatable were introduced. The first thing here to recognise is that these interface were not generic, which meant that in order to use the comparisons for value types boxing must occur. But they had to be non-generic, because the one comparer was passed through chains of objects of different types to do the comparison. Rock and hard place land.

There was one more rule that f-sharp comparisons had that wasn't part of "normal" .net equality, and this was for floating point numbers NaN <> NaN.

So yeah that is basically the state of play. So what did I do about it? In my first attempt I created who comparison objects using lots of runtime magic to try to create efficient comparison objects that implemented all of the fsharp rules and was then dynamically created using Type.MakeGenericType. At the time this came across two obstacles. The first was that whilst the old 64-bit JIT did lots of runtime optimizations to make this very efficient, the new Ryu JIT didn't (fair enough, I really was bending, albeit not breaking, the rules). The second was that, at the time, AOT compilation was gaining traction and MakeGenericType was an obstacle to that path.

My second attempt recognised that the same magic that I was performing is basically performed by (Equality)?Comparer.Default (and AOT catered for that) that in most cases the fsharp compilers implementation of IStructural(Comparable|Equatable) was the same as it's implementation of I(IComparable|Equatable)<>. And so with a bit of RTTI I could filter out problematic cases (i.e. those that used floating point numbers due to the NaN case) and then just return the default comparer in most cases. In fact this is basically what the standard f-sharp library is doing, as it just contains a list of known types (i.e intrinsic types) where it returns an efficient, non-boxing comparer. But they have no mechanism to make is available for non-built in types.

So the crux of my implementation relies on the following recursive type check:

let canUseDefaultEqualityComparer er (rootType:Type) =
    let processed = System.Collections.Generic.HashSet ()

    let rec checkType idx (types:Type[]) =
        if idx = types.Length then true
        else
            let ty = get types idx
            if not (processed.Add ty) then
                checkType (idx+1) types
            else
                let isValidGenericType ifNotType fullname =
                    if not (ty.IsGenericType && ty.GetGenericTypeDefinition().FullName.Equals fullname)
                    then ifNotType
                    else checkType 0 (ty.GetGenericArguments ())
                let isTypeAndGenericArgumentsOK fullname            = isValidGenericType false fullname
                let isNotTypeOrIsTypeAndGenericArgumentsOK fullname = isValidGenericType true  fullname

                // avoid any types that need special handling in GenericEqualityObj
                // GenericEqualityObj handles string as a special cases, but internally routes to same equality

                ty.IsSealed  // covers enum and value types
                             // ref types need to be sealed as derived class might implement  IStructuralEquatable
                && not (isArray ty)
                && (er || (not (ty.Equals typeof<float>)))
                && (er || (not (ty.Equals typeof<float32>)))
                && isNotTypeOrIsTypeAndGenericArgumentsOK "System.Nullable`1"
                && not (isStructuralEquatable ty
                        // we accept ValueTuple even though it supports IStructuralEquatable 
                        // if all generic arguements pass check
                        && not (   isTypeAndGenericArgumentsOK "System.ValueTuple`1"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`2"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`3"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`4"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`5"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`6"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`7"
                                || isTypeAndGenericArgumentsOK "System.ValueTuple`8"
                                || isTypeAndGenericArgumentsOK "Microsoft.FSharp.Collections.FSharpList`1"
                                || isTypeAndGenericArgumentsOK "Microsoft.FSharp.Core.FSharpOption`1"
                                || isTypeAndGenericArgumentsOK "Microsoft.FSharp.Core.FSharpValueOption`1"
                                || isTypeAndGenericArgumentsOK "Microsoft.FSharp.Core.FSharpResult`2"
                               )
                       )
                && checkType (idx+1) types

    checkType 0 [|rootType|]

So yeah, that's about it really. The rest of the PR was just tying this in so that it was efficiently used (i.e. only called once for any type) and attaching the various operators etc.

Anyway, this has come way, way, way too late to save my work code-base (where I have just ended up never using Structural Equality basically, not using default implementations of data structures, etc. sigh... maybe I'm the only one who cares about efficiency... why bother eh when you can just spin up another x number of cloud machines, he says, shaking his head...) but if people are going to take this as a PR seriously then I'll update it and submit a new request...

But I ain't getting any younger!

(So yeah, if the answer is no, just close this Issue...)

manofstick commented 4 years ago

@KevinRansom @abelbraaksma well there you go, I have thrown it out to the Emperor for the thumbs up or thumbs down!

KevinRansom commented 4 years ago

@manofstick, so your algorithm, basically consists of figuring out which types can use the default comparer? What are the circumstances where the behavior of existing code may change, can it be detected?

I would say that we can't leave a 20-25 * speed up on the table, we just added a new library level special case for DateTime, and @TIHan is eager to improve the performance of structural equality. So if you will consider re-submitting the work, I will commit to fighting for it to be included, I can't guarantee that we will win the argument, but we can durned well try.

How does that sound?

Kevin

manofstick commented 4 years ago

@KevinRansom

OK; I'll see if I can make some time over the next week or so... But I would like confirmation that calling a bit of refection code (i.e. the above snippet contains IsGenericType and GetGenericTypeDefinition) is OK. It is only called once per execution (per Type), so for me that's fine, but that's because I'm usually writing long running processes... In the schema of things, even for short running ones, I really don't think it's an issue, but I would hate it to be a stumbling block for final acceptance, given that we know it's there at the start...

abelbraaksma commented 4 years ago

This looks surprisingly much like a piece of code I have in my XPath implementation, where I've tried to get native type performance, and needed to cater for float semantics with NaN and INF. However, not completely comparable with my choice, nor reusable, as it allows mixing types in operators (float can be compared with int etc), which required some extra magic to get right.

@manofstick If I understand this correctly, the biggest slowdown is with struct types that are not native types, or that are not recognized specially (like now DateTime). And mixed struct (part is known, part isn't known).

I don't yet totally know what you do to get a comparer,and how that comparer looks, but I was thinking, if the type isn't special, would it be sufficient to do bitwise comparison? Isn't that what the default is in .NET land?

And another way forward: can we detect, at compile time, if there's a genetic compare/equate interface present, and use that?

Would it be helpful to set up a little brainstorm session on Slack or a call? It'll help to get our heads around this and in the same direction :)

abelbraaksma commented 4 years ago

&& checkType (idx+1) types

Little side note: your rec function is not tail-recursive. That may not matter much if struct types are expected to be small and contain few members. But if we choose this approach, and since this is a runtime check, we might as well optimize it to allow TCO.

KevinRansom commented 4 years ago

@manofstick , if the measurements support the approach, then it wins. Otherwise not ... of course there may be issues of reliability, reproducibility that impact the outcome. But at the end of the day, numbers are the most convincing rhetoric I know.

Thanks

Kevin

charlesroddie commented 4 years ago

Very illuminating post by @manofstick . F# equality has a lot of technical debt:

Non-generic StructuralComparable and IStructuralEquatable. We need to get to an architecture without these.
Going out of it's way to have NaN <> NaN. There is no need to force NaN <> NaN more than .Net does. Just use whatever .Net gives and that will be more than enough deference to IEEE. x <> x is fundamentally illogical, however many engineers say otherwise.
Very complex equality rules.

I believe we need to take a breaking change to remove all the special cases. If it's possible:

Use a generic compare/equate interface when available as @abelbraaksma suggets.
Otherwise use the non-generic one.

cartermp commented 4 years ago

The route we would likely take for quality is this one by @TIHan - https://github.com/dotnet/fsharp/pull/6175

Likely a similar approach for comparison. Although the reflection-based approach can speed things up, we should avoid reflection if we can.

abelbraaksma commented 4 years ago

Just use whatever .Net gives and that will be more than enough deference to IEEE. x <> x is fundamentally illogical, however many engineers say otherwise.

The most basic comparison instruction dealing with equality is the ceq opcode. Quote from the docs:

For floating-point number, ceq will return 0 if the numbers are unordered (either or both are NaN). The infinite values are equal to themselves.

I know from our discussion on Slack that you disagree with this point. And I've also come to understand that in C# == translates to ceq, while Double.Equals specifically returns true if both are NaN. Just saying that "whatever .NET does" doesn't have a simple answer, as it does both in surprisingly subtle ways (apart from the sentiment to adhere to standards like IEEE-754).

(unfortunately I don't have a "one size fits all" solution either)

manofstick commented 4 years ago

@charlesroddie

In regards to IStructuralComparable and IStructuralEquatable you can't have a generic version of these (by usage, not just current design). The key is in the name Structural... i.e. the comparer is intended to be passed through a range of objects not just one top level one. As in if you're comparing a tuple, then you need to know all the underlying types - not just the top level type, which is all a generic version would help you with! So really you need some magical factory - which is effectively what System.Collections.Generic.Comparer.Default (and hence it's IEquatable<> implementation uses) is. I.e. it generates a new Type on the fly - the comparer - where there is some underlying code generation going on. This was really the path I took with my first attempt, where I originally used the Emit library (via Linq.Expressions) but managed to whittle that away to purely used MakeGenericType with lots of generic magic. This was frowned up.

But yeah, if it was to exist it would have to have something like:

type IComparerFactory =
    abstract Get<'a> : unit -> IStructuralComparable<'a>

and IStructuralComparable<'a> =
    abstract CompareTo : item:'a -> IComparerFactory -> int

Given that, if anything, there has been a push to reduce generics reliance in the BCL, I highly doubt that such a interface, requiring such magic, will ever exist.

In regards to NaN. Meh, it is what it is.

...and I highly, highly, doubt any breaking change would ever happen...

manofstick commented 4 years ago

@abelbraaksma

I don't yet totally know what you do to get a comparer,and how that comparer looks, but I was thinking, if the type isn't special, would it be sufficient to do bitwise comparison? Isn't that what the default is in .NET land?

All I do is use (Equality)?Comparer<'a>.Default where I can - i.e. following the rules based on type analysis. (i.e. I can use it on a struct tuple as the struct tuples implementation uses the default comparer for it's comparison, so if all it's elements satisfy use, then it's good)

Now really my code should only be used when the type isn't known at runtime (i.e. in libraries, such as the use of dict) The change that @TIHan did in Devirtualize Equality should be used (with the same set of rules that I use) at compile time. I haven't tried to merge in his change with this one. There may be some issues, as I do a little bit of work with the optimizer (to narrow call paths basically - as although they should just get inlined by the JIT it is better because a) it doesn't need the extra work and b) F# shoves tail instructions in there which cause the JIT to balk at this so you can get terrible performance for > 64 bit structures...)

manofstick commented 4 years ago

@cartermp

The route we would likely take for quality is this one by @TIHan - #6175

Hahah, you're not inspiring me with confidence given that that's still open after ~ a year and a half!

manofstick commented 4 years ago

@KevinRansom over to you. Feel free to close this issue; the code is all there in #9404 - let's see if can make it past the gatekeepers this time!!

thinkbeforecoding commented 3 years ago

Things would already not be as bad if the (>) operator could compile as a call to IEquatable<'T>.CompareTo() without going through the Microsoft.FSharp.Core.LanguagePrimitives+HashCompare.GenericGreaterThanIntrinsic function that is very expensive... There would be 2 main ways out of this...

Expand the static contraint inlining to be able to add interface constraints (like "when 'T : IEquatable<'T> and struct") or member constraints ( when 'T : static member op_GreaterThan : 'T * 'T -> bool) to extand inlining cases in GenericGreaterThanFast
Detect call to F# (>) operator, and rewrite the AST to a call to IEquatable<'T>.CompareTo or op_GreaterThan if available.

|                    Method |        Mean |     Error |    StdDev | Ratio |
|-------------------------- |------------:|----------:|----------:|------:|
|   cmp_with_op_GreaterThan | 2,560.14 ns | 50.841 ns | 66.108 ns |  1.00 |
| cmp_with_CompareTo_of_obj |   594.50 ns | 11.893 ns | 19.205 ns |  0.23 |
|   cmp_with_CompareTo_of_T |    58.17 ns |  1.156 ns |  1.331 ns |  0.02 |
| cmp_custom_op_GreaterThan |    57.37 ns |  1.180 ns |  1.212 ns |  0.02 |
|                cmp_values |    57.73 ns |  1.106 ns |  1.034 ns |  0.02 |

This is the benchmark on 100 iteration to compare 2 instances of a struct containing an int.

cmp_with_op_GreaterThan : compare structs using F# (>) : x > y
cmp_with_CompareTo_of_obj : compare struct using IComparable: (x :> IComparable).CompareTo(y) > 0
cmp_with_CompareTo_of_T : compare struct using IComparable<'t>: (x :> IComparable<'t>).CompareTo(y) > 0
cmp_custom_op_GreaterThan : the struct implements a op_GreaterThan method that does x.Value > y.Value. Calls Struct.op_GreaterThan(x,y)
cmp_values : just does x.Value > y.Value

Some remarks for progressive enhancement... F# (>) seams to call CompareTo (leading to boxing) even when the struct implements CompareTo<_>. Calling the generic version of CompareTo<_> would already eliminate a jump and a boxing.

thinkbeforecoding commented 3 years ago

the call to op_GreaterThan is conceptualy faster than passing through IComparable<'T> because op_GreaterThan must return a bool, and can directly implement it a a "cgt" il instrustion, while IComparable<'T> must return an int and has to test both for greater and lower with a conditional jum... and then the result is tested for > 0... In fact, the timing is quite close.

A problem is IComparable<'t> is that you cannot call it on null value. This is not a problem on structs, and should not be a problem on F# types that are not Null.

This problem doesn't exist with op_GreaterThan: 't *'t -> bool .

The code in GenericComparisonWithComparerFast for primitive types is:

if (# "clt" x y : bool #) then (-1) else (# "cgt" x y : int #)

which is jitted with a conditional jump:

Comp.cmp1(Int32, Int32)
L0000: cmp ecx, edx
L0002: jge short L000a
L0004: mov eax, 0xffffffff
L0009: ret
L000a: cmp ecx, edx
L000c: setg al
L000f: movzx eax, al
L0012: ret

There is a non conditional version that is exactly as long:

(# "cgt" x y : int #) - (# "clt" x y : int #)

jitted:

Comp.cmp2(Int32, Int32)
L0000: cmp ecx, edx
L0002: setg al
L0005: movzx eax, al
L0008: cmp ecx, edx
L000a: setl dl
L000d: movzx edx, dl
L0010: sub eax, edx
L0012: ret

thinkbeforecoding commented 3 years ago

For reference, here is the code for a structure:

[<Struct>]
type Data(value: int) =
    member this.Value = value

    static member op_GreaterThan(x: Data, y: Data) =
        x.Value > y.Value

a comparison through IComparable<'T>:

let cmp_IComparable_T(x: Data, y: Data) =
    (x:> System.IComparable<Data>).CompareTo(y) > 0

is jitted as:

Comp.cmp_IComparable_T(Data, Data)
L0000: cmp ecx, edx               // compares x to y
L0002: jge short L000b            // jump to L000b if greater or equal
L0004: mov eax, 0xffffffff        // set result as -1
L0009: jmp short L0013            // jump to L0013
L000b: cmp ecx, edx               // if was greater greater, compare x to y
L000d: setg al                    // set result to 1 if greater, 0 else (if equal)
L0010: movzx eax, al              // extend result on larger register size
L0013: test eax, eax              // test result sign
L0015: setg al                    // set result to 1  if > 0
L0018: movzx eax, al              // extend result
L001b: ret

while the version with op_GreaterThan:

let cmp_static_greaterThan(x: Data, y: Data) =
    Data.op_GreaterThan(x,y)

is doing nothing more than necessary:

Comp.cmp_static_greaterThan(Data, Data)
L0000: cmp ecx, edx          // compare x and y
L0002: setg al               // set result to 1 if greater
L0005: movzx eax, al         // extend result
L0008: ret

dotnet / fsharp

The performance of Comparing and Ordering things #9348