dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.86k stars 779 forks source link

Nested reference type record values and nullable constraints #13498

Open NinoFloris opened 2 years ago

NinoFloris commented 2 years ago

Given the following examples I can't quite figure out what the compiler is doing here.

type InnerRecord = { Id: int }

[<Struct>]
type StructRecord = { Value: obj }
// Ok
let nullable = Nullable({Value = 1 })

[<Struct>]
type StructRecord2 = { Value: InnerRecord }
// A generic construct requires that the type 'StructRecord2' have a public default constructor
let nullable = Nullable({Value = { Id = 1 }}) 

[<Struct>]
type StructType(value: obj) =
    member _.Value = value
// Ok
let nullable = Nullable(StructType(1))

[<Struct>]
type StructType2(value: InnerRecord) =
    member _.Value = value
// A generic construct requires that the type 'StructType2' have a public default constructor
let nullable = Nullable(StructType2({ Id = 1 })) 

[<Struct>]
type StructType3(value: StructType) =
    member _.Value = value
// Ok
let nullable = Nullable(StructType3(StructType()))

[<Struct>]
type InnerStructRecord = { Id: int }

[<Struct>]
type StructType4(value: InnerStructRecord) =
    member _.Value = value
// Ok
let nullable = Nullable(StructType4({ Id = 1 }))

Why specifically is it not allowed to store a struct that has a reference type record as one of its fields? (And then secondly, if it is expected behavior the error is supremely unhelpful in figuring this out)

This issue seems related to a previous issue https://github.com/dotnet/fsharp/issues/7946#issuecomment-687266672 but the linked explanation reads like it should only apply to generic structs.

NinoFloris commented 2 years ago

Workaround for now shows how busted the compiler behavior is here.

[<Struct>]
type Holder<'T> = Value of 'T
and 't holder = Holder<'t>

type InnerRecord = { Id: int }

[<Struct>]
type StructRecord2 = { Value: InnerRecord holder }
// Ok
let nullable = Nullable({Value = Holder.Value { Id = 1 }}) 
dsyme commented 2 years ago

Discussing this with @NinoFloris - the intention of the overall design is to rule out the use of the union case holder

The logic here should be recursively checking the fields of all the cases of a struct union for whether they support default values

NinoFloris commented 2 years ago

One of the places where this shows up is in asp.net core minimal apis

There is a pattern for binding request details into an object like so


type MyCommand =
    { 
        offset: int // etc
    }

[<Struct>]
type RequestInfo = 
    {
        Value: MyCommand
    }
    static member BindAsync (ctx: HttpContext): ValueTask<Nullable<RequestInfo>> = 
        if ctx.Request.Query.ContainsKey("offset") then 
            let offset = int ctx.Request.Query["offset"]
            ValueTask<_>(Nullable({ Value = { ))
        else 
            ValueTask<_>(Nullable())

BindAsync is a recognized pattern that has to return an instance of the type it is defined on. The return type has to be ValueTask and null is failed to bind for reference types, while for struct types Nullable is used. https://docs.microsoft.com/en-us/aspnet/core/fundamentals/minimal-apis?view=aspnetcore-6.0

Today this fails to work because MyCommand is a record that has no 'default value' (and for sake of argument this is a record type we don't control, or can't change to a POCO attributed with AllowNullLiteral)

Example by David Fowler that this is expected use https://twitter.com/davidfowl/status/1540874260612358144

NinoFloris commented 2 years ago

Alright so after taking some time to think about this, here are my thoughts.

Starting with the heart of the actual issue, specifying the 'struct' constraint in C# results in its compiler emitting 3 IL constraints, valuetype, T: ValueType, and .ctor. On the F# side of things its compiler has far stricter checks for types to match the .ctor (default constructor) constraint. The interaction of these two decisions is where things become problematic.

Before we dive deeper into it all there are some things to be stated explicitly:

Looking around today, VB.Net rests in peace (though its huge volume of existing code lives on) while C# has bettered it's ways and attempts to actually execute the default constructor for the following code (C# 10, see https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-10.0/parameterless-struct-constructors#type-parameter-constraints-new-and-struct):

T Do<T>() where T: struct 
    => new T();
    .method assembly hidebysig static 
        !!T '<<Main>$>g__Struct2|0_1'<valuetype .ctor ([System.Runtime]System.ValueType) T> () cil managed 
    {
        // Method begins at RVA 0x206e
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: call !!0 [System.Runtime]System.Activator::CreateInstance<!!T>()
        IL_0005: ret
    } // end of method '<Program>$'::'<<Main>$>g__Struct2|0_1'

Notably, replacing new T() with default(T) would produce the expected initobj !!T, just as writing new T() did before C# 10. The realization being that default(T) is fully analogous to Unchecked.defaultof<'T>, that it's permitted in that method is not so much a consequence of the default ctor constraint but an escape hatch in the type system present in both languages, valid in any value position. This is important to understand as it points to a problem with this implied new() constraint C# had to deal with since v10. Now that they emit code to call these constructors, and given users could always write new T() in struct constrained methods the compiler actually can't check whether a valid constructor exists, doing so would break existing code!

Degrading the compile time safety just to respect default struct constructors almost certainly implies that C# language designers are equally unhappy with the slew of implied constraints but have little choice than to defer obvious checks to runtime to do so, without making a breaking change to the language. Unfortunately this also means it's unlikely they're going to change anything more here.

And this is where we are today, C# pushes a bunch of generally unnecessary constraints that F# has stricter requirements for, and the developer is stuck in the middle, sometimes unable to pass perfectly valid types to these over-constrained type parameters (and to pick on 'valid' here, F# can’t stop uninitialized values for its own structs coming out of IL types just as much as it can’t stop null values from appearing for records in similar ways, so that should never be a goal).

A good example where all of this really hurts is System.Nullable, since it's so pervasive. The main api of the type consists of a case discriminator HasValue and a way to retrieve that value through Value. When a Nullable is produced with the right case constructor Value is available, while it throws an exception otherwise; Effectively a C# version of a struct DU. At no point in the actual implementation (or any other conceivable implementation for this type) does it need to call the default ctor. Consequently F# will enforce this constraint far beyond the intent of its authors.

Knowing all this, my conclusion would be that F# should not be validating ‘defaultvalueness’ for these 'struct' constrained type parameters on value types just as it doesn’t validate nullness in generic contexts outside of F# code having the null constraint. The difficulty of course lies in achieving this (which is much harder than with the F# only null constraint). Constraints are viral and flow across abstraction; We can hardly flow whether some constraint came from IL through F# typars. Doing so would create all sorts of fun interactions around unification, constraints that look the same but don't act the same and other things we don't want to entertain.

One thing that I've considered is whether these 'defaultvalueness' checks should be under a new F# only constraint, just like null, the issue being that it would break so much F# code:

Around this point I stopped thinking about it.

One other (unsatisfying) approach would be to change the compiler to import just valuetype (potentially T : ValueType as well) for IL references when it detects all 3 constraints C#'s ‘struct’ emits for a type parameter. There are some decent reasons why I think this has merit:

(F# should probably begin syntactically allowing, IL emitting, and calling the default constructor for struct types in the same situations as C# started doing as well, this would need to be another RFC.)

Doing this would effectively remove the checks for all IL type parameters following that specific pattern and limit the 'defaultvalueness' checks to F# and C# code where the default constructor constraint is specified explicitly, which is much much more reasonable.

Understandably taking that approach might be too risky. Another option would be to introduce an attribute like AllowNullLiteral for dismissing these checks on struct types (such that disabling at the typar head type would be enough). Though the limitation would remain that generic struct types not defined in F# (more specifically, those not having this F# attribute) would not be able to be used for default ctor constrained typars in F# code when their generic instantiations bring in e.g. records, while they can be instantiated in other languages. Mixed-language codebases like the one I own would hit that limitation without question.

@dsyme also mentioned a similar approach using non-nullable reference type annotations in F#, that would provide a way to mark an offending field (though it may live arbitrarily deep inside a graph) of a type as allowing null values.

Both of these alternative proposals lack a certain 'elegance' of tackling the issue at the source, which is the generic/constraint side, not the individual types. That is most clear in generic composition where it would not always be so obvious or desired why you would need to mark a field on some distant type referenced 3 nested fields inwards from the source as nullable to use it, or even to give the top level type an attribute.

Ideally C# just stops emitting .ctor if implied by struct. Doing so would even enable them to allow struct and new() to be used together to activate new compile time constructor checks. Even if the C# team would want to do anything here it would probably be similarly difficult for them to do something that would introduce no or minimal breakage. Doing so would be of no real gain to C#, as it's just work to become a better IL neighbor (and even that is arguable from reading the spec for .ctor). I don't think this has much of a chance but it might be worth a shot before potentially going through a one-way door solution?

As I hope is clear by now: there is no fantastic solution but I am convinced we should do something here.

Finally, and most immediately relevant to this issue, two things:

T-Gro commented 1 year ago

Please convert to suggestion if not covered already.

NinoFloris commented 1 year ago

@T-Gro there is still a bug here, not sure why this was closed. To solve it properly though we should zoom out a bit and consider what impact it has on certain code patterns and enable workarounds before we fix the bug.

@dsyme do you have some time to think about this with me? I don't mind implementing something here in the 8.0 timeframe.

dsyme commented 1 year ago

@NinoFloris Thanks, I'll take a look