Open manofstick opened 4 years ago
@KevinRansom @abelbraaksma well there you go, I have thrown it out to the Emperor for the thumbs up or thumbs down!
@manofstick, so your algorithm, basically consists of figuring out which types can use the default comparer? What are the circumstances where the behavior of existing code may change, can it be detected?
I would say that we can't leave a 20-25 * speed up on the table, we just added a new library level special case for DateTime, and @TIHan is eager to improve the performance of structural equality. So if you will consider re-submitting the work, I will commit to fighting for it to be included, I can't guarantee that we will win the argument, but we can durned well try.
How does that sound?
Kevin
@KevinRansom
OK; I'll see if I can make some time over the next week or so... But I would like confirmation that calling a bit of refection code (i.e. the above snippet contains IsGenericType
and GetGenericTypeDefinition
) is OK. It is only called once per execution (per Type), so for me that's fine, but that's because I'm usually writing long running processes... In the schema of things, even for short running ones, I really don't think it's an issue, but I would hate it to be a stumbling block for final acceptance, given that we know it's there at the start...
This looks surprisingly much like a piece of code I have in my XPath implementation, where I've tried to get native type performance, and needed to cater for float
semantics with NaN and INF. However, not completely comparable with my choice, nor reusable, as it allows mixing types in operators (float can be compared with int etc), which required some extra magic to get right.
@manofstick If I understand this correctly, the biggest slowdown is with struct types that are not native types, or that are not recognized specially (like now DateTime
). And mixed struct (part is known, part isn't known).
I don't yet totally know what you do to get a comparer,and how that comparer looks, but I was thinking, if the type isn't special, would it be sufficient to do bitwise comparison? Isn't that what the default is in .NET land?
And another way forward: can we detect, at compile time, if there's a genetic compare/equate interface present, and use that?
Would it be helpful to set up a little brainstorm session on Slack or a call? It'll help to get our heads around this and in the same direction :)
&& checkType (idx+1) types
Little side note: your rec
function is not tail-recursive. That may not matter much if struct types are expected to be small and contain few members. But if we choose this approach, and since this is a runtime check, we might as well optimize it to allow TCO.
@manofstick , if the measurements support the approach, then it wins. Otherwise not ... of course there may be issues of reliability, reproducibility that impact the outcome. But at the end of the day, numbers are the most convincing rhetoric I know.
Thanks
Kevin
Very illuminating post by @manofstick . F# equality has a lot of technical debt:
StructuralComparable
and IStructuralEquatable
. We need to get to an architecture without these.NaN <> NaN
. There is no need to force NaN <> NaN
more than .Net does. Just use whatever .Net gives and that will be more than enough deference to IEEE. x <> x
is fundamentally illogical, however many engineers say otherwise. I believe we need to take a breaking change to remove all the special cases. If it's possible:
The route we would likely take for quality is this one by @TIHan - https://github.com/dotnet/fsharp/pull/6175
Likely a similar approach for comparison. Although the reflection-based approach can speed things up, we should avoid reflection if we can.
Just use whatever .Net gives and that will be more than enough deference to IEEE. x <> x is fundamentally illogical, however many engineers say otherwise.
The most basic comparison instruction dealing with equality is the ceq
opcode. Quote from the docs:
For floating-point number, ceq will return 0 if the numbers are unordered (either or both are NaN). The infinite values are equal to themselves.
I know from our discussion on Slack that you disagree with this point. And I've also come to understand that in C# ==
translates to ceq
, while Double.Equals
specifically returns true
if both are NaN
. Just saying that "whatever .NET does" doesn't have a simple answer, as it does both in surprisingly subtle ways (apart from the sentiment to adhere to standards like IEEE-754).
(unfortunately I don't have a "one size fits all" solution either)
@charlesroddie
In regards to IStructuralComparable
and IStructuralEquatable
you can't have a generic version of these (by usage, not just current design). The key is in the name Structural... i.e. the comparer is intended to be passed through a range of objects not just one top level one. As in if you're comparing a tuple, then you need to know all the underlying types - not just the top level type, which is all a generic version would help you with! So really you need some magical factory - which is effectively what System.Collections.Generic.Comparer.Default
(and hence it's IEquatable<>
implementation uses) is. I.e. it generates a new Type on the fly - the comparer - where there is some underlying code generation going on. This was really the path I took with my first attempt, where I originally used the Emit library (via Linq.Expressions) but managed to whittle that away to purely used MakeGenericType
with lots of generic magic. This was frowned up.
But yeah, if it was to exist it would have to have something like:
type IComparerFactory =
abstract Get<'a> : unit -> IStructuralComparable<'a>
and IStructuralComparable<'a> =
abstract CompareTo : item:'a -> IComparerFactory -> int
Given that, if anything, there has been a push to reduce generics reliance in the BCL, I highly doubt that such a interface, requiring such magic, will ever exist.
In regards to NaN. Meh, it is what it is.
...and I highly, highly, doubt any breaking change would ever happen...
@abelbraaksma
I don't yet totally know what you do to get a comparer,and how that comparer looks, but I was thinking, if the type isn't special, would it be sufficient to do bitwise comparison? Isn't that what the default is in .NET land?
All I do is use (Equality)?Comparer<'a>.Default
where I can - i.e. following the rules based on type analysis. (i.e. I can use it on a struct tuple as the struct tuples implementation uses the default comparer for it's comparison, so if all it's elements satisfy use, then it's good)
Now really my code should only be used when the type isn't known at runtime (i.e. in libraries, such as the use of dict
) The change that @TIHan did in Devirtualize Equality should be used (with the same set of rules that I use) at compile time. I haven't tried to merge in his change with this one. There may be some issues, as I do a little bit of work with the optimizer (to narrow call paths basically - as although they should just get inlined by the JIT it is better because a) it doesn't need the extra work and b) F# shoves tail
instructions in there which cause the JIT to balk at this so you can get terrible performance for > 64 bit structures...)
@cartermp
The route we would likely take for quality is this one by @TIHan - #6175
Hahah, you're not inspiring me with confidence given that that's still open after ~ a year and a half!
@KevinRansom over to you. Feel free to close this issue; the code is all there in #9404 - let's see if can make it past the gatekeepers this time!!
Things would already not be as bad if the (>) operator could compile as a call to IEquatable<'T>.CompareTo() without going through the Microsoft.FSharp.Core.LanguagePrimitives+HashCompare.GenericGreaterThanIntrinsic function that is very expensive... There would be 2 main ways out of this...
| Method | Mean | Error | StdDev | Ratio |
|-------------------------- |------------:|----------:|----------:|------:|
| cmp_with_op_GreaterThan | 2,560.14 ns | 50.841 ns | 66.108 ns | 1.00 |
| cmp_with_CompareTo_of_obj | 594.50 ns | 11.893 ns | 19.205 ns | 0.23 |
| cmp_with_CompareTo_of_T | 58.17 ns | 1.156 ns | 1.331 ns | 0.02 |
| cmp_custom_op_GreaterThan | 57.37 ns | 1.180 ns | 1.212 ns | 0.02 |
| cmp_values | 57.73 ns | 1.106 ns | 1.034 ns | 0.02 |
This is the benchmark on 100 iteration to compare 2 instances of a struct containing an int.
x > y
(x :> IComparable).CompareTo(y) > 0
(x :> IComparable<'t>).CompareTo(y) > 0
x.Value > y.Value
. Calls Struct.op_GreaterThan(x,y)
x.Value > y.Value
Some remarks for progressive enhancement... F# (>) seams to call CompareTo (leading to boxing) even when the struct implements CompareTo<_>. Calling the generic version of CompareTo<_> would already eliminate a jump and a boxing.
the call to op_GreaterThan is conceptualy faster than passing through IComparable<'T> because op_GreaterThan must return a bool, and can directly implement it a a "cgt" il instrustion, while IComparable<'T> must return an int and has to test both for greater and lower with a conditional jum... and then the result is tested for > 0... In fact, the timing is quite close.
A problem is IComparable<'t> is that you cannot call it on null value. This is not a problem on structs, and should not be a problem on F# types that are not Null.
This problem doesn't exist with op_GreaterThan: 't *'t -> bool .
The code in GenericComparisonWithComparerFast for primitive types is:
if (# "clt" x y : bool #) then (-1) else (# "cgt" x y : int #)
which is jitted with a conditional jump:
Comp.cmp1(Int32, Int32)
L0000: cmp ecx, edx
L0002: jge short L000a
L0004: mov eax, 0xffffffff
L0009: ret
L000a: cmp ecx, edx
L000c: setg al
L000f: movzx eax, al
L0012: ret
There is a non conditional version that is exactly as long:
(# "cgt" x y : int #) - (# "clt" x y : int #)
jitted:
Comp.cmp2(Int32, Int32)
L0000: cmp ecx, edx
L0002: setg al
L0005: movzx eax, al
L0008: cmp ecx, edx
L000a: setl dl
L000d: movzx edx, dl
L0010: sub eax, edx
L0012: ret
For reference, here is the code for a structure:
[<Struct>]
type Data(value: int) =
member this.Value = value
static member op_GreaterThan(x: Data, y: Data) =
x.Value > y.Value
a comparison through IComparable<'T>:
let cmp_IComparable_T(x: Data, y: Data) =
(x:> System.IComparable<Data>).CompareTo(y) > 0
is jitted as:
Comp.cmp_IComparable_T(Data, Data)
L0000: cmp ecx, edx // compares x to y
L0002: jge short L000b // jump to L000b if greater or equal
L0004: mov eax, 0xffffffff // set result as -1
L0009: jmp short L0013 // jump to L0013
L000b: cmp ecx, edx // if was greater greater, compare x to y
L000d: setg al // set result to 1 if greater, 0 else (if equal)
L0010: movzx eax, al // extend result on larger register size
L0013: test eax, eax // test result sign
L0015: setg al // set result to 1 if > 0
L0018: movzx eax, al // extend result
L001b: ret
while the version with op_GreaterThan:
let cmp_static_greaterThan(x: Data, y: Data) =
Data.op_GreaterThan(x,y)
is doing nothing more than necessary:
Comp.cmp_static_greaterThan(Data, Data)
L0000: cmp ecx, edx // compare x and y
L0002: setg al // set result to 1 if greater
L0005: movzx eax, al // extend result
L0008: ret
This story began half a decade ago with an adventurous implementation, and received a more reasonable, albeit more limited, implementation a mere two years ago yet as the tides of time go neither of these is (trivially) merge-able into current master. So rather than continuing to hit myself in the head with a hammer making an updated version, I'd rather put this out to the people who actually pull the strings of Pull Requests in order to determine if this effort will yield any fruit (or just a completely mashed head).
So, why's this a problem anyway? Here's a little use of structural comparison within a dictionary setting with NodaTime's
Instant
as the key...So I'm not sure if you notice there the slight difference in performance...
?? Huh what ??
That not a slight difference, that's a difference of ~20-25x slower using Structural comparison.
So why is this a problem and what can we do about it (well besides going back in a time machine and slurp in those PRs when they were ready...)
Well first of all we need to know why the decision was made to do this, why it doesn't cause pain with built in types, and then decide what we can do about it.
In original .net land, by default, reference types were equal only if
ReferenceEquals
is true and value types were only equal if, in a bitwise comparison, they were true (which meant if they had embedded reference types they were only considered equal if those ReferenceEquals were equal), but this didn't provide the type of experience that fsharp wanted to provide. Containers (array, ResizeArray, Dictionary, etc) were just seen as ordinary reference types.In f-sharp land, by default, records and discriminated unions (of both the reference and value type flavours) implement a series of methods/interfaces so that their compare ever element, and if the elements are all equal, then the object is declared equal. Containers (array, list, map, etc.) also implemented these rules so they too could do comparisons.
To facilitate the f-sharp rules, new interfaces
IStructuralComparable
andIStructuralEquatable
were introduced. The first thing here to recognise is that these interface were not generic, which meant that in order to use the comparisons for value types boxing must occur. But they had to be non-generic, because the one comparer was passed through chains of objects of different types to do the comparison. Rock and hard place land.There was one more rule that f-sharp comparisons had that wasn't part of "normal" .net equality, and this was for floating point numbers NaN <> NaN.
So yeah that is basically the state of play. So what did I do about it? In my first attempt I created who comparison objects using lots of runtime magic to try to create efficient comparison objects that implemented all of the fsharp rules and was then dynamically created using
Type.MakeGenericType
. At the time this came across two obstacles. The first was that whilst the old 64-bit JIT did lots of runtime optimizations to make this very efficient, the new Ryu JIT didn't (fair enough, I really was bending, albeit not breaking, the rules). The second was that, at the time, AOT compilation was gaining traction andMakeGenericType
was an obstacle to that path.My second attempt recognised that the same magic that I was performing is basically performed by
(Equality)?Comparer.Default
(and AOT catered for that) that in most cases the fsharp compilers implementation ofIStructural(Comparable|Equatable)
was the same as it's implementation ofI(IComparable|Equatable)<>
. And so with a bit of RTTI I could filter out problematic cases (i.e. those that used floating point numbers due to the NaN case) and then just return the default comparer in most cases. In fact this is basically what the standard f-sharp library is doing, as it just contains a list of known types (i.e intrinsic types) where it returns an efficient, non-boxing comparer. But they have no mechanism to make is available for non-built in types.So the crux of my implementation relies on the following recursive type check:
So yeah, that's about it really. The rest of the PR was just tying this in so that it was efficiently used (i.e. only called once for any type) and attaching the various operators etc.
Anyway, this has come way, way, way too late to save my work code-base (where I have just ended up never using Structural Equality basically, not using default implementations of data structures, etc. sigh... maybe I'm the only one who cares about efficiency... why bother eh when you can just spin up another x number of cloud machines, he says, shaking his head...) but if people are going to take this as a PR seriously then I'll update it and submit a new request...
But I ain't getting any younger!
(So yeah, if the answer is no, just close this Issue...)