dotnet / csharplang

The official repo for the design of the C# programming language
11.1k stars 1.01k forks source link

Proposal: Fast, Efficient Unions #7016

Open agocke opened 1 year ago

agocke commented 1 year ago

Fast, Efficient Unions

Many types of unions, including discriminated unions, have been proposed for C#. This proposal, unlike others that came before, is focused on building a new, efficient primitive for C# and .NET. As of now, C# and .NET have lacked a mechanism for expressing any kind of type union, meaning an "or" relationship between types.

While developers might reasonably prefer a wide variety of designs for union types depending on their domain, as a language and type system it should be a goal to build broad, useful primitives on which many separate complex structures could be built.

To that end, any union primitive we construct should flexible and as efficient in both space and computation.

Given this constraint, we know that we want a struct-based solution. Introducing allocations into a previously allocation-free code path is rarely acceptable, while introducing structs into reference-type code paths is generally fine. There are also two union designs which stand out as good candidates.

The first is what we might call a "pure" union -- a combination of multiple types in an "or" relationship. For this we could introduce a pure "or" type-level combinator, which for this proposal we will call | and use in combination with type variables in the form (A|B|C).

The second design is what is usually described as a "discriminated" or "tagged" union. In this design we would define a new type declaration form to introduce the union, and a series of cases which delimit precisely the set of types that comprise the state of the union when in that case. For instance,

enum struct E
{
    Case1(A),
    Case2(B),
    Case3(C),
}

It should be noted that, although only the second form is described as a "tagged" union, in fact the "pure" union is tagged as well, since all values carry type tags in the .NET runtime and can be queried through type tests. The usefulness and usability of each tagging system is covered in more detail later.

Semantics

To discuss the different designs usefully, we must assign semantics to the different designs. These semantics could be debated, but I will do my best to define what I see as the most "generous" capabilities in each design, without regard for implementation. When implementation is covered later, we will note problems and limitations.

To start we may note the similarities in both union designs. Both provide the ability to represent multiple values of different types in a single value and single type specification.

Pure unions

In the case of pure unions, the inline | combinator provides the ability to represent anonymous unions directly in type syntax. This is paired with a trivial introduction form -- a value of pure union type may be created simply by assigning a value of one of its cases to it. This makes pure unions particularly good at ad hoc modeling. In any type location a pure union could be provided, and in general, no additional code is required to create values of that type, either in the method return position, method argument position, or local variable assignment position.

On the other hand, while pure unions are simple for ad hoc modeling, they may not be as effective for larger scale programs. While it is useful that the | combinator does not require a type declaration, it can be problematic that it also doesn't allow for one. Consider that C# variable names are rarely pithy single characters like A or B and in fact are usually quite long. A|B looks good in examples but,

string|List<string>|Dictionary<string, string> M(string|List<string>|Dictionary<string, string> u)
{ ... }

isn't quite as attractive or readable. The inability to declare unions using the pure union syntax is made more complicated by the fact that C# doesn't have a particularly good way of declaring new type aliases. The best answer is the using directive, which cannot be public, cannot represent open generics, can only appear at the top of a file, and often requires specifying the fully qualified type name of any alias targets. While these limitations could be changed or lessened, these changes are not currently proposed -- partially because C# has historically not preferred this style of programming over direct declarations.

Aside from declaration simplicity, the union | combinator is also conceptually simple and familiar. Many C# programmers are already familiar with the | value combinator and its extension to the type level is mostly natural. The type-level combinator is commutative and associative, just like its value form. Subtyping also follows the same simple set theory rules that hold true of the set "union" combinator, which many developers will be familiar with, e.g. A|B is a subtype of A|B|C, as is C|A.

Lastly, the elimination form is also simple and familiar -- dynamic type checking through pattern matching. To deconstruct a union A|B we can use simple pattern matching to check for each of the cases, i.e.

void M<A, B>(A|B union)
{
    if (union is A a)
    {
        ...
    }
    if (union is B b)
    {
        ...
    }
}

or with a switch expression

void M<A, B>(A|B union)
{
    var x = union switch {
        A a => ...,
        B b => ...
    };
}

In the second case, we are even automatically provided exhaustiveness checking, since all cases of the union have been handled in the switch.

Unfortunately, the simplicity defined above does come with some significant downsides. In particular, the flexibility of |, including allowing generic type parameters as cases, prevents unions from guaranteeing disjointedness. That is, for any union of the form A|B, there is no guarantee that A and B do not have a subtype or identity relationship. For instance, if the method M in the previous example were instantiated as M<object, string>, the union of A|B would be substituted as object|string. This creates potential problems for uses of the elimination forms, as case B would never be executed in this situation. While the switch expression in the second example would continue to check for exhaustiveness in this case, it would not check for subsumption.

Tagged unions

In contrast with pure unions, tagged unions provide only a declaration form, not an inline form. The benefits and drawbacks are almost the mirror image of the pure union cases described above. A tagged union is declared via the syntax enum struct, deliberately reminiscent of simple C# enums. Inside the body of the enum struct is a series of cases. Each case begins with a case name and is followed by a comma-separated list of types, similar to a tuple type:

enum struct S
{
    Case1(A),
    Case2(A, B)
}

Unlike C# enums, the case names don't directly represent values, instead they are simply names used in the introduction and elimination forms. For the introduction form, each case name corresponds to a public static method on the enum struct type. This method is defined to have a parameter list with parameter types and arities that match the list following the case name in the enum struct declaration.

For instance, in the above example a valid expression would be S.Case1(a), where a is a variable of type A. The result of this expression would be a value of type S.

To use the values passed into case methods on S, there is a corresponding elimination form in pattern matching to retrieve the values, similar to deconstruction. The case name once again appears as a new language construct, also nested under the enum struct name scope, which we might simply call a "variant." Variants may appear only as patterns, in the same position as a type pattern. Following the variant, there may be a deconstruction pattern which matches the arity and types of the original parameter list to the enum struct case. For instance,

if (x is S.Case1(A a))
{

}
if (x is S.Case2(A a, B b))
{

}

or

var y = x switch {
    S.Case1(A a) => ...
    S.Case2(A a, B b) => ...
}

Like pure unions, the above elimination forms guarantee exhaustiveness. Unlike pure unions, however, tagged unions are guaranteed to be disjoint and therefore they also guarantee subsumption. Also unlike pure unions, they do not have a "transparent" form, which means that they cannot be eliminated without using the new, variant-case elimination form.

What may be most notable about the above formulation is that, while pure unions may be conceptually familiar, tagged unions should be particularly familiar to C# programmers, as they behave quite similarly to traditional C# enums (thus the choice of similar syntax). Like enums, tagged unions have only a declaration form. Like enums, they have a series of named cases. Like enums these cases can be switched on. The primary difference is that each enum struct case carries values of arbitrary different types, while all traditional enums are guaranteed to have a single value of one shared underlying primitive type.

Implementations

Pure unions

For pure unions, there is one obvious encoding to the current CLR type system. For any union type (T1|T2|T3|...Tn), this could be translated to an instantiation of a predefined struct type OneOf<T1, T2, T3, ... Tn>. This fits well because:

That said, our optimizations around conversions are limited. In particular, because CLR structs lack variance capabilities, we cannot cheaply implement the subtyping relationship that we described in the semantics above. Instead we would be forced to emit a series of type checks for each case, converting each one individually to the new type. This is a small, but non-zero cost. We would have to provide the same trick for case reordering, as the CLR would see OneOf<A, B> and OneOf<B, A> as different types, despite our desire that A|B and B|A represent the same type.

This is particularly unfortunate in places where the CLR forces nested type equivalence, like in assigning List<OneOf<A, B>> to List<OneOf<B, A>>. Here such a conversion would be particularly expensive, likely forcing the allocation of new backing memory.

Tagged unions

Tagged unions would preferably be implemented using standard struct types, with some runtime optimizations. Every enum struct would need to contain a tag, to enforce our guarantee of disjointedness. It would also need to contain fields for each type in each type list in each case. Lastly it would also need to contain methods for creating instances of the enum struct. A declaration like

enum struct S
{
    Case1(A),
    Case2(A, B)
}

may look like

readonly struct S
{
    public readonly byte Tag;
    public readonly A <>Case1_1;
    public readonly A <>Case2_1;
    public readonly B <>Case2_2;

    public static S Case1(A a) => new S(1, a, default, default);
    public static S Case2(A a, B b) => (2, default, a, b);

    private S(byte tag, A case1_1, A case2_1, B case2_2)
    {
        Tag = tag;
        <>Case1_1 = case1_1;
        <>Case2_1 = case2_1;
        <>Case2_2 = case2_2;
    }
}

and the elimination form x is Case2(A a, B b) would look like

x.Tag == 2 && (x.<>Case2_1, x.<>Case2_2 is (A a, B b))

In general, most operations should be simple operations on the larger struct, since cases cannot be represented directly, only held inside a value of the parent type, or deconstructed to constituents. Improvements on basic structs would likely be left to the runtime.

Space efficiency

First, let's consider the space efficiency of possible implementations of the options. For all unions, space efficiency is defined relative to a tuple type with each of the cases, since tuples can represent all possible values of all possible union cases simultaneously.

For pure unions, any space optimization comes from runtime optimization. Since each type parameter in OneOf may be an incompatible overlapping type, the runtime will have to be responsible for special-casing the OneOf type to provide better space utilization.

For tagged unions, this same optimization strategy could be employed, or the C# compiler could participate. Since certain types and members would be known to be overlap-compatible, the compiler could emit FieldOffset as appropriate to optimize space. Space usage of tagged unions may be uniformly higher since they may be forced to always include an extra tag field.

Computation efficiency

As mentioned previously, pure unions may be forced to do more individual type checks and assignment to provide conversion semantics.

Tagged unions likely will not need any such overhead. Most overhead will likely come from checking the tag. If tag checking is cheaper than type checking, tagged unions may be slightly more efficient, or vice versa.

Conclusion

Overall, it seems that both proposals have significant advantages and disadvantages.

However, tagged unions appear to initially fit better with C#. The syntax and semantic forms align more closely with the way C# has been designed and how it is widely used. As an analogy, we might note that tuples are to pure unions as structs/classes are to tagged unions. While tuples are a valuable language feature with unique advantages, their usage pales in comparison to structs and classes. C# is simply optimized for the kind of programming that favors structs and classes.

Indeed, one of the biggest weakness of pure unions appears to be that many of the "elegant" parts of their semantics would be difficult to efficiently implement in C#. With these disadvantages, the field is tilted even more towards tagged unions.

However, in the same way that tuples and structs/classes are both part of C# and provide unique advantages, it might be worthwhile to consider adding both designs.

If we must add only one, or prioritize one, then tagged unions should be preferred.

Edit:

I talked with the runtime team on potential optimizations for any union implementation we came up with. Unfortunately, I missed that there's a fundamental tearing problem with object and value overlap. Consider,

struct BuiltInUnion
{
   byte Tag;
   int Value1;
   string Value2;
}

This type represents a union of int and string. Ideally we would like to have the minimal possible space by overlapping Value1 and Value2, since BuiltInUnion will only use one of them at a time. But note that this is a multi-word-sized type. Even if Value1 and Value2 overlap, the tag exceeds word size, which means that it is not possible to atomically update this type during a copy. So if BuiltInUnion appears in a location shared by two threads, copying could produce tearing such that the tag is updated and the value isn't, or vice versa, which would produce memory unsafety and potentially crash the runtime.

There isn't a simple fix here, just a set of tradeoffs. Those tradeoffs are:

  1. Size
  2. RW time
  3. Allocation
  4. Ref struct
  5. Safety

(4) would mean making enums ref structs, which would remove the data race, but would be incredibly restrictive. Probably not a general solution. (5) would compromise memory safety, which is unacceptable in my opinion.

That leaves size, allocation, or RW time. Fixing the race through synchronization would mean including what is effectively a spin lock, which increases copy time by ~150x. Allocation has the standard tradeoffs. Size might be the best thing to compromise on.

HaloFour commented 1 year ago

I would agree that tagged unions, as implemented as structs with named cases, seems like the "easier" approach and I think would be a low cost abstraction, which I think is critical for common DUs like Option<T>/Result<T>.

I think there's also room for reference-based DUs. If they had a similar enum class syntax they could be implemented like DUs in F#, as an abstract class with nested case classes. Or C# could go the Java route and offer a way to describe what types are permitted to implement or extend from an interface or abstract class:

public sealed interface Foo permits Bar, Baz { ... }

You get no syntax shorthand but the compiler could make assumptions as to whether or not your matches are exhaustive.

FaustVX commented 1 year ago

Seeing enum struct means enum class is also valid (at least to me) And like @HaloFour said, enum class could mean

enum class Foo
{
  Bar,
  Baz(string a),
}

can be translated to

abstract sealed class Foo
{
  public sealed class Bar : Foo { }
  public sealed class Baz(string a) : Foo { public string A { get; } = a; }
}
FaustVX commented 1 year ago

@HaloFour

Or C# could go the Java route and offer a way to describe what types are permitted to implement or extend from an interface or abstract class:

public sealed interface Foo permits Bar, Baz { ... }

You get no syntax shorthand but the compiler could make assumptions as to whether or not your matches are exhaustive.

I don't really like that syntax, on Foo you specify Bar and Baz. And on Bar and Baz you specify they implements Foo It seems to be redundant.

HaloFour commented 1 year ago

@FaustVX

I don't really like that syntax, on Foo you specify Bar and Baz. And on Bar and Baz you specify they implements Foo It seems to be redundant.

Yes, it is redundant, but it's also explicit. I'm not suggesting that be the syntax that C# adopts, but it does offer a way to describe a closed hierarchy, which I think is suitable for a discriminated union of related types.

RikkiGibson commented 1 year ago

Sorry if I missed it but I wondered what the semantics of 'default' are for enum structs. If such a value makes its way into my system, what constructs will match it? Will it just be the first case of my enumeration with default values for all the case fields, for example?

HaloFour commented 1 year ago

@RikkiGibson

Sorry if I missed it but I wondered what the semantics of 'default' are for enum structs. If such a value makes its way into my system, what constructs will match it? Will it just be the first case of my enumeration with default values for all the case fields, for example?

I think there's a whole conversation that would have to happen there. In F# I think it's the first case. That may be safe in some contexts, e.g. Option<T>.None, but probably less so in others. But it's probably difficult (if not impossible) to avoid default so consideration should be made as to how it is handled.

FaustVX commented 1 year ago

@RikkiGibson

Sorry if I missed it but I wondered what the semantics of 'default' are for enum structs. If such a value makes its way into my system, what constructs will match it? Will it just be the first case of my enumeration with default values for all the case fields, for example?

Maybe put the Tag value to 0 which signify an invalid state and any other defined values will be 1 or greater. And maybe allow for

enum struct Option<T>
{
    None = 0,
    Some(T value) = 1,
}

and in this case default(Option<int>) == None

orthoxerox commented 1 year ago

I think there's also room for reference-based DUs. If they had a similar enum class syntax they could be implemented like DUs in F#, as an abstract class with nested case classes.

Or existing patterns around private constructors and nested classes can be recognized by the compiler.

enum class Foo
{
  Bar,
  Baz(string a),
}

isn't much shorter than

abstract record Foo
{
  private Foo() {}

  public sealed record Bar : Foo;
  public sealed record Baz(string a) : Foo;
}
RikkiGibson commented 1 year ago

Declaring a private constructor on abstract type seems like a pretty decent gesture for I intend this to be a closed set.

One place where this could fall over is if the type is from another assembly. I could update versions and now there's a new case at runtime which I didn't account for.

Of course this seems like a risk with all possible representations. What I think is missing is a gesture to say how that flow is supposed to work when new cases are added over time. When should the exhaustiveness warning be issued or not.

HaloFour commented 1 year ago

@RikkiGibson

One place where this could fall over is if the type is from another assembly. I could update versions and now there's a new case at runtime which I didn't account for.

That's true for DUs in general. I believe adding new cases is considered a breaking change. I think the only way to avoid that would be to indicate that the DU is non-exhaustive and that consumers should be forced to consider the unknown case. Otherwise, the compiler will end up throwing SwitchExpressionException exceptions.

RikkiGibson commented 1 year ago

We don't ask the user to handle null when the switch input is a non-nullable reference type.

#nullable enable
M(null!); // will throw at runtime

void M(object obj)
{
    Console.Write(obj switch { object obj1 => obj1.ToString() });
}

I think that kind of behavior is what we should have as a default for DUs, even from other assemblies.

// OtherAssembly
enum struct MyUnion
{
    Case1, Case2(int item)
}

// My code
void M(MyUnion u)
{
    Console.Write(u switch { Case1 => "a", Case2(_) => "b" }); // no warning.
}

From the semantic/diagnostics point of view, act like it is exhaustive, but still insert that throw branch if they don't use a pattern that matches "anything else" such as _.

Then, if we judge there is sufficient need for DUs which are not treated as exhaustive, introduce a mechanism for marking them as such, and give warnings for those if "anything else" is not handled.

agocke commented 1 year ago

Sorry if I missed it but I wondered what the semantics of 'default' are for enum structs. If such a value makes its way into my system, what constructs will match it? Will it just be the first case of my enumeration with default values for all the case fields, for example?

This is a great question which I don't really have a satisfying answer for. I suspect the answer will be "the first case with default values for each of the types" but there's no good reason why that's the case, it just is the case.

An answer which I would prefer is, "you can't take the default value of enum structs", but obviously that ship sailed long ago.

HaloFour commented 1 year ago

@agocke

I suspect the answer will be "the first case with default values for each of the types" but there's no good reason why that's the case, it just is the case.

I suspect that will be less error-prone than having a tag of 0/default indicate an invalid state which will likely result in exceptions thrown at runtime. I think it will be suggested to designers of DUs that they should order their tags so that it's "safer" for the first tag to be the default state. But maybe if the DU can explicitly specify the tag for each case, as suggested by @FaustVX , then that would enable DU authors to opt-in to having default be an invalid state:

public enum Option<T> {
    None = 0, // safe default
    Some(T value) // implied 1
}

public enum Either<TL, TR> {
    Left(TL value) = 1,
    Right(TR value) // implied 2
}
yaakov-h commented 1 year ago

I suspect the answer will be "the first case with default values for each of the types" but there's no good reason why that's the case, it just is the case.

I would have expected default(SomeUnionOfSeveralThings) to be none of the things in the union, but I can also see immediately that this won't appeal to Option<T> or similar types.

agocke commented 1 year ago

I suspect the answer will be "the first case with default values for each of the types" but there's no good reason why that's the case, it just is the case.

I would have expected default(SomeUnionOfSeveralThings) to be none of the things in the union, but I can also see immediately that this won't appeal to Option<T> or similar types.

I actually like this. Maybe all tag values start at 1, and if you get 0 then your program blows up at the first switch expression.

HaloFour commented 1 year ago

@agocke

I actually like this. Maybe all tag values start at 1, and if you get 0 then your program blows up at the first switch expression.

Maybe by default? I think there are cases (Option<T> specifically) where assuming the first case is safe.

Would be nice to get the perspective from the F# folks here.

agocke commented 1 year ago

default is always unsafe in F#. You should never use it. If you know you're returning an Option, just use None.

HaloFour commented 1 year ago

@agocke

default is always unsafe in F#. You should never use it. If you know you're returning an Option, just use None.

Agreed, and I think the C# compiler could warn here as well. I'd be more interested in cases where you end up with default from the runtime where neither language can prevent it from happening.

agocke commented 1 year ago

You're right, if you do default(Option<int>) in F#, you get None. https://sharplab.io/#v2:EYLgtghglgdgNAGxAMwM5wC4gE4FcYA+AsAFAICmGABAB5UC8VAqjAMYAW5rA1uQCYA6PuWQRcCDAHtkAHgDyABwxRJMGbAwA+TaUgYOtKgHcoGdqQJUAcqvJUAtJqoLsG5FQBEMWx4tUAypJgdlAOTi5ungCkfB5UUEA===

HaloFour commented 1 year ago

@agocke

Yep, same is true with custom struct DUs, the first case has a tag of 0:

open System

[<Struct>]
type Multicase =
    | Case1 of Case1 : int
    | Case2

let x = Unchecked.defaultof<Multicase>
match x with
| Case1 i -> printf "%d" i
| Case2 -> printf "none"

This prints 0

RikkiGibson commented 1 year ago

I would be disappointed if a mechanism were introduced for preventing "unsafe" default enum struct values, where regular old struct wouldn't be able to participate.

agocke commented 1 year ago

Yeah, that's fair. Tbh, I think the basic approach of, "it's the first one in the case of default" is both precise enough that things will work out, but also esoteric enough that people will generally avoid using default for enum structs. That's a good place to be.

orthoxerox commented 1 year ago

Non-defaultable value types would help a lot with designing proper struct tagged unions (Either<L, R> in language-ext has three tags to avoid mistakenly getting a default L or R from a default value of Either). However, I wouldn't want to see the DUs delayed by the massive amount of work needed to retrofit non-defaultable value types into C#. I'd rather get Option<T> and Result<T> sooner and deal with Result.Failure.Message being null. Or even better, it being a non-nullable property over the real nullable error message field that ??s the value with "someone tried to pass you a default value". Will the tags on DUs support properties or custom deconstructors?

YairHalberstadt commented 1 year ago

In #4628 I suggested using the syntax A or B instead of A|B for union types.

This is because you can already write if (x is A or B) today, and it would be natural to assume that once union types exist you should be able to write if (x is (A or B) aOrB). This works well if the union type is indeed named A or B.

RikkiGibson commented 1 year ago

I also am wondering if 'or' gives us an opportunity to unify these language concepts.

What concerned me was: doesn't that mean if I say if (x is (A or B) y), the type is different before and after this feature. I thought that pattern was already allowed--not at my PC to check.

YairHalberstadt commented 1 year ago

Unfortunately it does have a meaning, but not what you would expect. The following compiles:

public class C {
    public void M(X x) {
        if (x is (int or string) y) {}
    }
}

public class X {
    public void Deconstruct(out object z) { z = null; }
}

It deconstructs X, and then matches it against the pattern int or string.

It's exceedingly rare to have a type which deconstructs to a single variable, so I think it would be reasonable to either make a breaking change here, or check for the deconstruct, and otherwise have the intuitive meaning.

RikkiGibson commented 1 year ago

What I meant was for y in the example if (x is (A or B) y) changes from being whatever it was before (maybe the type of x or a common type of A and B?), to actually being a value of a union type.

333fred commented 1 year ago

@RikkiGibson y is X, because the input type is X, which has a Deconstruct method. It was deconstructed, the result was checked for int or string, then y was created with the type being deconstructed, or X.

agocke commented 1 year ago

Unfortunately it does have a meaning, but not what you would expect.

It deconstructs X, and then matches it against the pattern int or string.

Chalk this up as another source of friction with pure unions.

RikkiGibson commented 1 year ago

My feeling is that having to explicitly declare all the unions you use can be annoying if you just want to return this, that or the other thing, and the user checks which one it was. Yet the amount of orthogonality problems we sidestep by not blessing a particular OneOf type in the runtime seems very significant. (Maybe there's a term more fitting than orthogonality here. Don't know.)

I'm pretty convinced that even if you implement special boxing behavior for a OneOf type, you will still get into trouble when it comes to checking conversions involving 'object' or type parameters, expectations around subtyping relationships when one pure-union type is convertible to another, etc... I don't recall if Andy discussed these problems here or just in meeting.

Basically, it's a case where making it so the cliff is easy to spot may be more valuable than pushing the cliff out as far as possible. If you have to get into some total corner case before the seams of pure-unions start to show, you might be worse off than saying: hey, these are just structs, they box and compare like structs, operate accordingly.

agocke commented 1 year ago

Basically, it's a case where making it so the cliff is easy to spot may be more valuable than pushing the cliff out as far as possible. If you have to get into some total corner case before the seams of pure-unions start to show, you might be worse off than saying: hey, these are just structs, they box and compare like structs, operate accordingly.

I think the way I would put this is: it's really hard to build complex systems on top of other complex systems. Tagged unions may have some downsides in one-liners, but they're also really simple conceptually and implementation-wise. Since the goal of this proposal was to build a solid foundation for unions in dotnet, tagging feels like the right base to start from.

That's not to say that we shouldn't build pure unions. Just that the tag representation feels like a more predictable foundation to build on.

orthoxerox commented 1 year ago

From the list @Richiban posted in one of the older discussions, a couple of features are still missing before we can get that sweet tag union Option<T>:

  1. Covariance in classes. Necessary for an Option<Banana> to be assignable to Option<Fruit> etc.
  2. Free objects. See: None. An object declaration is both a type and a value, i.e. a singleton. Can participate in inheritance hierarchies, as in this case. if it's a tag and not an object, this doesn't matter.
  3. Sealed classes. Necessary for the compiler to be able to prove completeness. again, if the tags are not types, this doesn't matter
  4. Bottom type (Nothing). Allows the singleton None, in conjunction with (1), to be assignable to Option<T> for any T.

I wonder if the bottom type requirement can be worked around. We don't really have to reuse the same value of None for all unions, do we? And type inference plus #2926 should let us write None and get the compiler to infer Option<Foo>.None.

HaloFour commented 1 year ago

@orthoxerox

I don't think any of those are necessary for the implementation of tagged unions. They would have been more necessary for type unions, especially to avoid allocations and to allow singleton identities, e.g. None and avoid tons of allocations that all mean the same thing. Covariance would be nice in general but I think can be considered orthogonal.

Richiban commented 1 year ago

I agree that if DUs are structs (for now?) Then points 2 and 3 don't apply and 4 can be obviated with "Name lookup for simple name when target type known" #2926.

Number 1, while not mandatory of course, could be a pain in the backside to live without in certain domains. It's already a bit of a pain point with Task<T>, although it relatively easy to work around by just putting an await in there somewhere.

With DUs, though, that won't be an option, and I can imagine people quickly getting frustrated at having to write

Option<Fruit> y = x switch {
    None => None,
    Some(var banana) => Some(banana)
}
HaloFour commented 1 year ago

@Richiban

With DUs, though, that won't be an option, and I can imagine people quickly getting frustrated at having to write

😁

Option<Fruit> y = x.Select(banana => (Fruit) banana);
Richiban commented 1 year ago

@Richiban

With DUs, though, that won't be an option, and I can imagine people quickly getting frustrated at having to write

😁

Option<Fruit> y = x.Select(banana => (Fruit) banana);

Map please. :P

But no, I'd actually be very curious to see whether the BCL team considers providing Linq implementations for Option<T> out of the box. After using Louthy's lib for years I quickly grew to depend on Select and SelectMany; without them some things are unreasonably difficult / laborious:

Option<int> a = ...;
Option<int> b = ...;

Option<int> c = 
    from a1 in a
    from b1 in b
    select a1 + b1;

Granted, this becomes a bit easier if you can pattern match (which you couldn't back when LanguageExt was released), but I'd still define these methods if they didn't come OOTB.

YairHalberstadt commented 1 year ago

Note you wouldn't need covariance in classes, just structs. Covariance in structs is actually pretty trivial, since you copy the struct on casting from one to the other. Basically all the runtime has to do is to treat this copy of Option<Banana> as an Option<Fruit> and everything should just work like magic.

Richiban commented 1 year ago

Note you wouldn't need covariance in classes, just structs. Covariance in structs is actually pretty trivial, since you copy the struct on casting from one to the other. Basically all the runtime has to do is to treat this copy of Option<Banana> as an Option<Fruit> and everything should just work like magic.

That's interesting; I'd never heard before that struct made all the difference here! That's good news.

Richiban commented 1 year ago

@Richiban

With DUs, though, that won't be an option, and I can imagine people quickly getting frustrated at having to write

😁

Option<Fruit> y = x.Select(banana => (Fruit) banana);

Also, it just occurred to me that if you're going to implement Select for Option<T>, you might as well implement Cast as well:

Option<Fruit> = maybeBanana.Cast<Fruit>();
YairHalberstadt commented 1 year ago

Actually I think I might be wrong. Consider this:

struct X<T> {
    public List<T> List;
}

var x = new X<string>{List = new List<string>()};
List<object> y = ((X<object>)x).List;

But I think it should be true for any struct without fields which are themselves generic, but are not just the raw type parameter.

E.g. this should be fine:

struct X<T> {
   string s;
   T t;
}

But not this:


struct X<T> {
    Y<T> y;
}
Richiban commented 1 year ago

@YairHalberstadt Perhaps it requires that all fields that use the generic type parameter are also covariant?

YairHalberstadt commented 1 year ago

@richiban yes, that sounds right.

Pyrdacor commented 1 year ago

I can't really see any benefit from unions in C# at all. You can do all the stuff with records, generics, interfaces or nullables. For me this is just a niche feature which can be implemented with existing language features quite well.

In typescript it is handy but you can easily create types and type aliases on the fly there. It is also useful for low level stuff. But it doesn't fit to a language like C# imo.

HaloFour commented 1 year ago

@Pyrdacor

I think that feedback might be more specific to type unions? I believe those are the only kind of unions that TypeScript has. This proposal is more specifically about tagged unions, which are not built out of a type hierarchy because it is a goal to ensure that they are a very low cost abstraction and avoid allocations. Implementing those with existing language features is more of a challenge and lacks ergonomics that a dedicated syntax could provide.

Pyrdacor commented 1 year ago

My feedback included tagged unions as well. They are only needed when you need "something" that can be of totally unrelated types. And this is not a common use case imo. Otherwise you could just use a base interface. In case of "something or nothing" you can just use a nullable.

Afaik the advantage of a union is less memory consumption. For example if you need something which can be string or a float you can just use a record with two properties and a tag to identify which of them is used. The only advantage of an union would be that you use the same memory for those two properties. If not, you can just use the record.

But is this really a thing in a high level language like C#? In worst case you waste 8 bytes per instance if you opt for a reference when a type would be larger than this. Even if you have large arrays of that union type this isn't that much.

So the advantage of a union isn't really a thing in C# imo. And without it, I can just use a record, interface or nullable as mentioned.

Pyrdacor commented 1 year ago

Implementing those with existing language features is more of a challenge and lacks ergonomics that a dedicated syntax could provide.

Do you have an example? I mean you can just create a generic type with n generic arguments and a field/property for each of them. Then add a tag property (or make the props nullable). You can just re-use it everywhere without a language change.

CyrusNajmabadi commented 1 year ago

My feedback included tagged unions as well. They are only needed when you need "something" that can be of totally unrelated types. And this is not a common use case imo.

I definitely disagree with this. And our own Roslyn API is full of nastyness because of it. We codifed SyntaxNodeOrToken because of something so basic. And patterns like Success | Error are wonderful to use, but a total PITA to have to try to jam into the system while getting both efficiency and pleasant consumption/creation/operation patterns to work with them. :)

CyrusNajmabadi commented 1 year ago

Then add a tag property (or make the props nullable

You now need to know precisely what subsets of properties are available for a particular kind. There's also the lack of ergonomics of having to match a kind, then know that a particular nullable property is now available (and non-null). You also, as mentioned in a few chats, completely loss exhaustiveness on your tag, etc.

You might want to read https://github.com/dotnet/csharplang/discussions/7010 which goes into depth into the challenges and why existing options fall short (which is why we have so often gotten requests here ot make this space better).

Pyrdacor commented 1 year ago

My feedback included tagged unions as well. They are only needed when you need "something" that can be of totally unrelated types. And this is not a common use case imo.

I definitely disagree with this. And our own Roslyn API is full of nastyness because of it. We codifed SyntaxNodeOrToken because of something so basic. And patterns like Success | Error are wonderful to use, but a total PITA to have to try to jam into the system while getting both efficiency and pleasant consumption/creation/operation patterns to work with them. :)

SyntaxNodeOrToken share a common thing. They are both some part of the input so for me it is perfectly fine that they share a common base interface you can use instead of a union.

Success | Error also describe something related so they also share a common base interface / ancestor / type.

For me it makes much more sense to use an interface (base type) here instead of a union. Imo a union is only useful if the types are unrelated. And I don't see that here.

CyrusNajmabadi commented 1 year ago

For me it makes much more sense to use an interface (base type) here instead of a union.

"Fast", "Efficient" Unions :)