Code Generation for Record.CompareTo(object, IComparer) lacks type-safety checks, and casts before checking nulls

dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio

https://dotnet.microsoft.com/languages/fsharp

MIT License

3.92k stars 786 forks source link

Code Generation for Record.CompareTo(object, IComparer) lacks type-safety checks, and casts before checking nulls #5380

Open EBrown8534 opened 6 years ago

EBrown8534 commented 6 years ago

When generating the CompareTo(object, IComparer) for a record type, there is no type-safety for casting. Instead, the CompareTo(object, IComparer) simply does a direct-cast to the record type. This means callers from C#/VB.NET to CompareTo(object, IComparer) will get an InvalidCastException if they send an object that is not the correct type.

Repro steps

F# Code:

type TestType =
    { Name : string }

C# Code:

var testType = new FsLibrary.TestType("Test");
Console.WriteLine(testType.CompareTo("string", null));

The generated ILASM for the TestType::CompareTo:

.method public hidebysig virtual final instance int32
        CompareTo(object obj,
                  class [mscorlib] System.Collections.IComparer comp) cil managed
{
  .custom instance void [mscorlib]
    System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) 
  // Code size       59 (0x3b)
  .maxstack  4
  .locals init ([0] class FsLibrary.TestType V_0)
  IL_0000:  ldarg.1
  IL_0001:  unbox.any FsLibrary.TestType
  IL_0006:  stloc.0
  IL_0007:  ldarg.0
  IL_0008:  ldnull
  IL_0009:  cgt.un
  IL_000b:  brfalse.s IL_002c
  IL_000d:  ldarg.1
  IL_000e:  unbox.any FsLibrary.TestType
  IL_0013:  ldnull
  IL_0014:  cgt.un
  IL_0016:  brfalse.s IL_002a
  IL_0018:  ldarg.0
  IL_0019:  ldfld      string FsLibrary.TestType::Name@
  IL_001e:  ldloc.0
  IL_001f:  ldfld      string FsLibrary.TestType::Name@
  IL_0024:  call int32 [mscorlib]System.String::CompareOrdinal(string,
                                                                     string)
  IL_0029:  ret
  IL_002a:  ldc.i4.1
  IL_002b:  ret
  IL_002c:  ldarg.1
  IL_002d:  unbox.any FsLibrary.TestType
  IL_0032:  ldnull
  IL_0033:  cgt.un
  IL_0035:  brfalse.s IL_0039
  IL_0037:  ldc.i4.m1
  IL_0038:  ret
  IL_0039:  ldc.i4.0
  IL_003a:  ret
} // end of method TestType::CompareTo

The obvious problem here is the unconditional unboxing of the argument to the FsLibrary.TestType. There is no attempt made to test the type for compatibility.

Additionally, this unboxing is always done, is done several times, and is not saved until the branch for which it is needed (IL_001e).

This can be more easily seen on sharplab.io:

https://sharplab.io/#v2:DYLgZgzgPgLgngBwKYAIAqSIzY1BeAWACgVSUBvFAOQEMBbVEFLAJwEsA7AcxQF9igA=

The generated C# for CompareTo(object, IComparer) is:

[CompilerGenerated]
public sealed override int CompareTo(object obj, IComparer comp)
{
    TestType testType = (TestType)obj;
    if (this != null)
    {
        if ((TestType)obj != null)
        {
            return string.CompareOrdinal(Name@, testType.Name@);
        }
        return 1;
    }
    if ((TestType)obj != null)
    {
        return -1;
    }
    return 0;
}

The culprit here can be easily identified as TestType testType = (TestType)obj, as well as the location of that cast, the fact that we cast obj to TestType in two other locations (rather than using the previously-casted version).

Expected behavior

The CompareTo(object, IComparer) should be generated with a type-check, then if the type is compatible it should be casted. It should also do the cast only when necessary, rather than as the first step.

Ideally, a more-proper C# version would look like:

[CompilerGenerated]
public sealed override int CompareTo(object obj, IComparer comp)
{
    if (this != null)
    {
        if (this is TestType) // Test for type here, can also use `as`
        {
            TestType testType = (TestType)obj; // Move this into the block where it's absolutely necessary
            if (testType != null)
            {
                return string.CompareOrdinal(Name@, testType.Name@);
            }
            return 1;
        }
    }
    if (obj != null) // No cast needed here, but not actively harmful to keep
    {
        return -1;
    }
    return 0;
}

Actual behavior

The generated CompareTo(object, IComparer) is not properly compatible with other .NET languages.

Known workarounds

None. The C# version must implement it's own comparison.

Related information

Windows 10 Pro for Workstations (10.0.16299.492)
Public Visual Studio 2017 Community
.NET 4.7.2
Visual Studio 2017 Community 15.7.2
Low/Medium, for someone like me who does a lot of C# <-> F# Interop it's far more severe

TIHan commented 6 years ago

I think this might be an improvement, it seems reasonable; but I don't know the full intentions of why we do a direct cast rather than check.

zpodlovics commented 6 years ago

I guess these code assume the F# usage only (somewhat reasonable for F# records): "If a type is defined in F#, null is not permitted as a regular value unless the AllowNullLiteral attribute is applied to the type. If a type is defined in some other .NET language, null is a possible value, and when you are interoperating with such types, your F# code might encounter null values." [1]

Also value types (struct record) will not require null checks as null is not a valid value for a value type (unless represented as Nullable<'T> but that's a different story)

In the future could these non-nullable requirement also be encoded in a more interoperable way too eg.: as a modreq?

[1] https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/dd233197(v=vs.110)

EBrown8534 commented 6 years ago

@TIHan That may be the case, but it's also worth noting that the end result of this code is the cast being performed 3 times in the function, when one will suffice. Granted, casting is usually quick, but there's no reason to do it that often, especially when I assume this function is used a lot by the compiler.

Looking at the docs, it looks like even the Microsoft support docs do a direct cast, not sure why, but it still feels wrong:

https://support.microsoft.com/en-us/help/320727/how-to-use-the-icomparable-and-icomparer-interfaces-in-visual-c

The role of IComparable is to provide a method of comparing two objects of a particular type. This is necessary if you want to provide any ordering capability for your object. Think of IComparable as providing a default sort order for your objects. For example, if you have an array of objects of your type, and you call the Sort method on that array, IComparable provides the comparison of objects during the sort. When you implement the IComparable interface, you must implement the CompareTo method, as follows:
// Implement IComparable CompareTo method - provide default sort order.
int IComparable.CompareTo(object obj)
{
   car c=(car)obj;
   return String.Compare(this.make,c.make);
}

Even their example has no type-safety check. I wonder if that's because IComparable is typically only used by collections.

I still don't like it, but I like it less now that I see that the support docs use it.

reinux commented 3 months ago

Is there a chance this can be addressed? You don't even need to call from another language for this to be a problem:

type Animal =
  abstract member Name: string
  inherit System.IComparable

type Dog =
  { numWoofs: int }
  interface Animal with
    member this.Name = "dogbert"

type Cat =
  { numMeows: int }
  interface Animal with
    member this.Name = "catsy"

let dog = { numWoofs = 3 }
let cat = { numMeows = 5 }

let map = Map<Animal, string> [ dog, "woof"; cat, "bahhh" ]

Results in:

System.InvalidCastException: Unable to cast object of type 'Dog' to type 'Cat'.

Granted, the key of a map is usually something simple, but I don't think this is necessarily all that uncomon a use case.

Martin521 commented 3 months ago

It seems to me the above is a bug different from the one in the OP. The map constructor is

    new(elements: seq<_>) =
        let comparer = LanguagePrimitives.FastGenericComparer<'Key>
        new Map<_, _>(comparer, MapTree.ofSeq comparer elements)

and 'Key is Animal. The code, however, throws in Cat.CompareTo(Object obj, IComparer comp). So, my conclusion is that there is something wrong in FastGenericComparer. But I may be completely off. I have a problem debugging right now, but will try to further look into this. Unless somebody tells me this is the wrong track.

brianrourkeboll commented 3 months ago

@Martin521 Is that not because Animal's IComparable.CompareTo implementation just delegates to the concrete type's implementation, i.e., when IComparable.CompareTo is ultimately called, it will walk up the inheritance tree to the implementation on Cat, whose body will throw if its obj argument is not in fact another Cat (and is instead a Dog upcast to Animal upcast to obj).

I.e., a call logically analogous to this is made

((IComparable)(Animal)cat).CompareTo((object)(Animal)dog)

which then reaches an implementation like this

int CompareTo(object obj, IComparer comp)
{
    // `obj` may actually be a boxed `Dog` or anything else that
    // implements `Animal` when invoked inside of a `Map<Animal, _>`.
    Cat cat = (Cat)obj;
    …
}

One thing to consider in @reinux's example is what exactly the compiler would do instead.

Manually overriding the implementation of IComparable helps to illustrate this point:

type Animal =
  abstract member Name: string
  inherit System.IComparable

[<CustomComparison; CustomEquality>]
type Dog =
  { numWoofs: int }

  override this.Equals obj =
    match obj with
    | :? Dog as other -> this.numWoofs = other.numWoofs
    | _ -> false

  override this.GetHashCode () = this.numWoofs

  interface System.IComparable with
    member this.CompareTo obj =
      match obj with
      | :? Dog as other -> compare this.numWoofs other.numWoofs
      | _ -> failwith "What should happen here?? Cat is hierarchically unrelated to this type, and it is not even defined yet, anyway."

  interface Animal with
    member this.Name = "dogbert"

[<CustomComparison; CustomEquality>]
type Cat =
  { numMeows: int }

  override this.Equals obj =
    match obj with
    | :? Cat as other -> this.numMeows = other.numMeows
    | _ -> false

  override this.GetHashCode () = this.numMeows

  interface System.IComparable with
    member this.CompareTo obj =
      match obj with
      | :? Cat as other -> compare this.numMeows other.numMeows
      | _ -> failwith "What should happen here??"

  interface Animal with
    member this.Name = "catsy"

let dog = { numWoofs = 3 }
let cat = { numMeows = 5 }

let map = Map<Animal, string> [ dog, "woof"; cat, "bahhh" ]

In both cases above, what is a meaningful comparison value for the compiler to return if the other object is of a completely unrelated type? 1? -1? Surely not 0?

How would that work if Cat always returned 1 when obj was not a Cat, and Dog always returned 1 when obj was not a Dog?

That would mean that a Cat value and a Dog value would have different relative orderings depending on which one you asked.

So yes, when the compiler is implementing CompareTo for a record type, it is definitely not considering unrelated types that happen to implement the same interfaces, and it doesn't seem like it would make much sense for it to do so.

brianrourkeboll commented 3 months ago

That said, the compiler does still seem to be emitting more unbox.anys than it needs to:

.method public final hidebysig virtual 
    instance int32 CompareTo (
        object obj,
        class [System.Runtime]System.Collections.IComparer comp
    ) cil managed 
{
    .custom instance void [System.Runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
        01 00 00 00
    )
    // Method begins at RVA 0x20e4
    // Code size 56 (0x38)
    .maxstack 5
    .locals init (
        [0] class _/Dog,
        [1] int32,
        [2] int32
    )

    IL_0000: ldarg.1
    IL_0001: unbox.any _/Dog
    IL_0006: stloc.0
    IL_0007: ldarg.0
    IL_0008: brfalse.s IL_002c

    IL_000a: ldarg.1
    IL_000b: unbox.any _/Dog
    IL_0010: brfalse.s IL_002a

    IL_0012: ldarg.0
    IL_0013: ldfld int32 _/Dog::numWoofs@
    IL_0018: stloc.1
    IL_0019: ldloc.0
    IL_001a: ldfld int32 _/Dog::numWoofs@
    IL_001f: stloc.2
    IL_0020: ldloc.1
    IL_0021: ldloc.2
    IL_0022: cgt
    IL_0024: ldloc.1
    IL_0025: ldloc.2
    IL_0026: clt
    IL_0028: sub
    IL_0029: ret

    IL_002a: ldc.i4.1
    IL_002b: ret

    IL_002c: ldarg.1
    IL_002d: unbox.any _/Dog
    IL_0032: brfalse.s IL_0036

    IL_0034: ldc.i4.m1
    IL_0035: ret

    IL_0036: ldc.i4.0
    IL_0037: ret
} // end of method Dog::CompareTo

[CompilerGenerated]
public sealed override int CompareTo(object obj, IComparer comp)
{
    Dog dog = (Dog)obj;
    if (this != null)
    {
        if ((Dog)obj != null)
        {
            int num = numWoofs@;
            int num2 = dog.numWoofs@;
            return ((num > num2) ? 1 : 0) - ((num < num2) ? 1 : 0);
        }
        return 1;
    }
    if ((Dog)obj != null)
    {
        return -1;
    }
    return 0;
}

Martin521 commented 3 months ago

@brianrourkeboll Thanks for looking into this.

@Martin521 Is that not because Animal's IComparable.CompareTo implementation just delegates to the concrete type's implementation, i.e., when IComparable.CompareTo is ultimately called, it will walk up the inheritance tree to the implementation on Cat, whose body will throw if its obj argument is not in fact another Cat (and is instead a Dog upcast to Animal upcast to obj).

Yes, that's what's happening. But what should we conclude? a) Do nothing. Accept that a runtime exception is thrown by correct F# code. b) Disallow (at compile time) Map<'Key, 'Value> when 'Key is an interface. c) Have FastGenericComparer return some unique ordering, not relying on delegated CompareTo. I don't know if b) or c) are possible, but a) seems wrong to me.

brianrourkeboll commented 3 months ago

a) Do nothing. Accept that a runtime exception is thrown by correct F# code.

Hmm, I'm not sure that the example is correct code, though.

For it to be meaningful, the types need to have some kind of specified relation to each other, e.g.,

type IAnimal = abstract Name : string

type Animal =
    | Dog of Dog
    | Cat of Cat

and Dog =
    { numWoofs : int }
    interface IAnimal with
        member _.Name = "dogbert"

and Cat =
    { numMeows : int }
    interface IAnimal with
        member _.Name = "catsy"

let dog = Dog { numWoofs = 3 }
let cat = Cat { numMeows = 5 }

// The key must be `Animal`, not `IAnimal`, because the types are not related to each other through `IAnimal`.
let map = Map<Animal, string> [ dog, "woof"; cat, "bahhh" ]

Otherwise, it's equivalent to doing something like this — which compiles, but which likewise fails at runtime, because each type's IComparable implementation has no sensible way to give a meaningful result when presented with a value of a completely unrelated type:

open System

let s = Set<IComparable> [1; "1"; DateOnly (2024, 07, 30)]
// System.ArgumentException: Object must be of type String.
//    at System.String.CompareTo(Object value)
//    at Microsoft.FSharp.Collections.SetTreeModule.add[T](IComparer`1 comparer, T k, SetTree`1 t) in D:\a\_work\1\s\src\FSharp.Core\set.fs:line 139
//    at Microsoft.FSharp.Collections.SetTreeModule.ofSeq[a](IComparer`1 comparer, IEnumerable`1 c) in D:\a\_work\1\s\src\FSharp.Core\set.fs:line 694
//    at Microsoft.FSharp.Collections.FSharpSet`1..ctor(IEnumerable`1 elements) in D:\a\_work\1\s\src\FSharp.Core\set.fs:line 970

brianrourkeboll commented 3 months ago

b) Disallow (at compile time) Map<'Key, 'Value> when 'Key is an interface.

I don't think that would make sense, because of course (using the same example) all of the objects implementing a given interface could be of the same type. It's just that unifying types at compile time to (non-generic) IComparable is inherently unsafe (unlike unifying two values of compatible types to IComparable at runtime, which is fine, and which is what happens if just Cat is used as a map key, etc.).

open System

// Safe, and the compiler will ensure that it is.
let s = Set<int> [1; 2; 3]

// Happens to be safe, but the compiler cannot and will not help us if it isn't.
let s' = Set<IComparable> [1; 2; 3]

// Not safe... But the compiler has no choice.
// We're literally telling it: "trust us;
// the values may be different types, but they
// implement the same interface, and we swear
// that the interface implementation for each
// individual type knows how to handle this situation
// in a sensible way."
// (It does not.)
let s'' = Set<IComparable> [1; 2; "3"]

I guess there are a couple things that make this less obvious in the animal example:

The compiler's implementation of IComparable for Cat and Dog is invisible.
The upcasting to IComparable inside of Map is invisible. The user is focused on the upcast to the Animal interface, not on the fact that there will then be another upcast to IComparable — even though the inherit IComparable declaration literally implies that any Animal may be treated directly as an IComparable.

reinux commented 3 months ago

Otherwise, it's equivalent to doing something like this — which compiles, but which likewise fails at runtime, because each type's IComparable implementation has no sensible way to give a meaningful result when presented with a value of a completely unrelated type

I came to that conclusion when I thought about implementing IComparable myself -- the only thing that stopped me was the length of the resulting code, as demonstrated by @brianrourkeboll in his comment.

My plan was to use System.Type.GUID, which does implement IComparable.

Semantically, this doesn't make all that much sense since it's just a random ID, but I do think it would enable using data structures that require comparison. A maliciously-crafted assembly with a colliding GUID type might be cause for concern, so that might warrant some investigation first.

For my current use case, I ended up just using List.find instead, since I don't anticipate O(n) being too much of an issue. I hope!

brianrourkeboll commented 3 months ago

@reinux

Semantically, this doesn't make all that much sense since it's just a random ID, but I do think it would enable using data structures that require comparison.

Since it doesn't sound like you actually need meaningful comparison for your use-case, what about just using a hash/equality-based dictionary? The compiler-generated implementation of equality works fine for this.

open System.Linq
open System.Collections.Generic

type Animal =
  abstract member Name: string

type Dog =
  { numWoofs: int }
  interface Animal with
    member this.Name = "dogbert"

type Cat =
  { numMeows: int }
  interface Animal with
    member this.Name = "catsy"

let dog = { numWoofs = 3 }
let cat = { numMeows = 5 }

let map : Dictionary<Animal, string> = [ struct (dog :> Animal, "woof"); struct (cat :> Animal, "bahhh") ].ToDictionary ()

Or, using a little library I made:

#r "nuget: FSharp.Collections.Builders"

open FSharp.Collections.Builders

type Animal =
  abstract member Name: string

type Dog =
  { numWoofs: int }
  interface Animal with
    member this.Name = "dogbert"

type Cat =
  { numMeows: int }
  interface Animal with
    member this.Name = "catsy"

let dog = { numWoofs = 3 }
let cat = { numMeows = 5 }

let map = dictionary<Animal, _> { dog, "woof"; cat, "bahhh" }

Martin521 commented 3 months ago

I am still not convinced.

Yes, looking from inside the system, one can explain why cats and dogs cannot be compared. But as a user, I think it is reasonable to expect that Set<Animal> (where Animal fulfills the required comparison constraint) is what is says: a set of Animals, and you should be able to put any animal into it.

I therefore still wonder if option c) of my comment above is such a bad idea. (Here is a very naive but working version.)

brianrourkeboll commented 3 months ago

(Here is a very naive but working version.)

Interesting. Some comments about that specific approach:

Compiler-generated comparison is currently purely structural. That particular approach would add an invisible nominal component.
Changing any part of a type's fully-qualified name does not currently affect the outcome of a comparison at runtime. With that approach, changing any part of a type's fully-qualified name could affect its runtime behavior, whether renaming the type or cutting some code out of one module and pasting it into another. Imagine someone trying to debug why their code changed behavior when all they did was move it from one spot to another!
It would be especially tricky when comparing one type defined in a regular module and namespace against another defined in a file without a top-level module or namespace declaration, where the implicit module name is the filename.^1 In that case, renaming the file could affect the type's runtime behavior!

I appreciate the outside-the-box thinking, but I think that the current behavior is the only correct behavior, in spite of the unfortunate unsafety it enables. Once the comparison is no longer based purely on the type's structural components, it is no longer really structural comparison.

compare ``🍎`` ``🍊``

Ultimately, I believe that comparisons like this must fail, because no meaningful comparison exists, unless the developer implements IComparable explicitly and either:

Specifies a comparison that is meaningful to their use-case.
Uses an arbitrary disambiguator like a GUID or fully-qualified name, which basically implies that the required semantics aren't really comparison but rather equality, as noted in https://github.com/dotnet/fsharp/issues/5380#issuecomment-2258320020.

To put it in more proverbial terms — is there really anything meaningful that the compiler could do to make the following code not throw?[^2]

open System

type IFruit = inherit IComparable
type Apple = Apple with interface IFruit
type Orange = Orange with interface IFruit

compare<IFruit> Apple Orange

> If a code file does not begin with a top-level module declaration or a namespace declaration, the whole contents of the file, including any local modules, becomes part of an implicitly created top-level module that has the same name as the file, without the extension, with the first letter converted to uppercase.

[^2]: Contrast that with the following, which will run and produce false. Just because we can say that two objects are not equal does not mean we can give them a meaningful relative ordering.

    box Apple = box Orange

Martin521 commented 3 months ago

Ok, I think I have to give in regarding c). Thanks for your persistance and all the details.

Still, the current behavior (a runtime exception when putting apples and oranges into a Set) feels wrong. If we cannot compare apples and oranges, then a Set\<IFruit> should not be allowed at compile time. Potential solutions

Make the comparison constraint stronger (requiring an implemented IComparable at the level of the type argument). (I think this would have been a good solution in F# 1.0)
Require, at compile time, for Set/Map an implemented IComparable of the type argument. Emit an error, or at least a warning, if the requirement is not met.

brianrourkeboll commented 3 months ago

@Martin521 Hmm, it's probably theoretically possible to write an analyzer to emit such a warning in at least some cases by inspecting the typed tree.

It may not be easy, though, especially if you want to avoid false positives.

For example, you'd probably want to allow this, since it's perfectly valid:

let xs : Set<IComparable> = set [1; 2; 3]

Whereas you'd want to warn on this:

let xs : Set<IComparable> = set [1; 2; "3"]

But what about these?

let f (xs : Set<IComparable>) = xs |> Set.add "3"
let xs : Set<IComparable> = set [1; 2]
f xs // This throws.
f (Set<IComparable> [1; 2]) // This throws.
f (Set<IComparable> ["1"; "2"]) // This is fine.

I guess it would probably be easier to just write an analyzer that always warns whenever a type parameter with the comparison constraint is instantiated to an interface type, and people could choose to enable it if they wanted the warning.

Martin521 commented 3 months ago

I guess it would probably be easier to just write an analyzer that always warns whenever a type parameter with the comparison constraint is instantiated to an interface type, and people could choose to enable it if they wanted the warning.

Yes, that's what I had in mind. Question is, of course, if the effort is well spent on this rare case. Anyway, I might tackle it at some point just to learn about analyzers.

NOTE TO READERS: all of the above discussion is about this comment. The original issue is still open.

reinux commented 3 months ago

I might be missing something, but I'm not sure why something like c) wouldn't be the best approach.

I think apples and oranges can be meaningfully compared -- just that all apples are infinitely distant from all oranges in any given dimension, which is a notion that's well-captured by comparing any comparable, unique identifier of the types themselves prior to comparing the actual values.

For my use case, a list was adequate. For others, ImmutableDictionary may work just as well. But for crank-the-world type applications, e.g. Elmish models, you really want to be able to do incremental updates, which means trees, which means comparisons.

With that approach, changing any part of a type's fully-qualified name could affect its runtime behavior, whether renaming the type or cutting some code out of one module and pasting it into another.

This is why I was considering System.Type.Guid, actually. Although, to be fair, I'm not sure what the lifetime of the GUID is either.

At the same time, I can't think of a practical consequence if it does turn out to be ephemeral. If it's semantically nonsensical to compare Dogs to Cats, then there almost certainly isn't any code that would depend on any persistence either.

Not that it's directly pertinent, but record comparisons in F#, if I understand correctly, are already nominal before they're structural. You can't compare, say, a value of type A = { x: int } to another of type B = { x: int }. So I'm not sure there's any damage done by deviating from theory somewhat. Compile-time warnings/errors would be one thing, but there isn't really anything theoretically sound about throwing an exception anyway.