dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.89k stars 4.01k forks source link

Proposal: Record Enum Types #6739

Closed alrz closed 5 years ago

alrz commented 8 years ago

Record Enum Types

Currently, enum types wouldn't be considered as complete patterns and they don't have any "record" syntax to be used in pattern matching. This proposal tries to fill this gap.

This proposal won't affect regular enum types like #3704, rather, it suggests an enum-like syntax for declaring flat hierarchies of ADTs (with both value and reference types).

Enum structs

Enum structs would be more like Java enum types, for example

public enum struct Color(int R, int G, int B) {
    Blue(0,0,255),
    Green(0,255,0),
    Red(255,0,0)
}

would translate to

public struct Color {
    public readonly static Color Blue = new Color(0, 0, 255);
    public readonly static Color Green = new Color(0, 255, 0);
    public readonly static Color Red = new Color(255, 0, 0);

    private Color(int R, int G, int B) {
        this.R = R;
        this.G = G;
        this.B = B;
    }

    public int R { get; }
    public int G { get; }
    public int B { get; }
}

Another example from Java docs:

public enum struct Planet(double Mass, double Radius) {
    Mercury (3.303e+23, 2.4397e6),
    Venus   (4.869e+24, 6.0518e6),
    Earth   (5.976e+24, 6.37814e6),
    Mars    (6.421e+23, 3.3972e6),
    Jupiter (1.9e+27,   7.1492e7),
    Saturn  (5.688e+26, 6.0268e7),
    Uranus  (8.686e+25, 2.5559e7),
    Neptune (1.024e+26, 2.4746e7),
    Pluto   (1.27e+22,  1.137e6);

    public const double G = 6.67300E-11;

    public double SurfaceGravity =>
        G * Mass / (Radius * Radius);

    public double SurfaceWeight(double otherMass) =>
        otherMass * SurfaceGravity;
}

This struct must not be instantiable, because of the completeness of the pattern.

Enum classes

Enum classes are useful for declaring flat hierarchy of ADTs (similar to F# discriminated unions). For example. the following

public sealed abstract class Option<T> {
    public sealed class Some(T Value) : Option<T>;
    public sealed class None() : Option<T>;
}

could be written as

public enum class Option<T> {
    Some(T value : Value),
    None;
}

With enum classes, abstract sealed would be considered as an advanced feature where you can declare more complicated ADTs.

Remarks

public enum class Expr {
  Const(double Value)        { public override string ToString() => Value.ToString(); },
  Mul(Expr Left, Expr Right) { public override string ToString() => $"({Left} * {Right})"; },
  Add(Expr Left, Expr Right) { public override string ToString() => $"({Left} + {Right})"; },
  Div(Expr Left, Expr Right) { public override string ToString() => $"({Left} / {Right})"; },
  Sub(Expr Left, Expr Right) { public override string ToString() => $"({Left} - {Right})"; },
}
HaloFour commented 8 years ago

I think there'd be a lot of potential with treating "class/struct enums" as ADTs. As a syntax I think it would be intuitive to users already familiar with C# enums. In fact I think I'd be quite happy if said "enums" would considered specifically in how they would benefit in combination with pattern matching and ADTs rather than just copypasta of Java's implementation.

bbarry commented 8 years ago

A minor quibble, neither of these are "complete" as I can make any sequence of 96 bits into a Color struct fairly easily (the simplest being default(Color)) and null is a valid value for this Option<T> type.

That said, I can do the same thing with an int backed enum today so the struct issue isn't a big deal breaker. And it seems if we simply accept the fact that null is a valid value for a reference type, the notion of completed reference types is entirely solvable.

This sort of gets to the root of my problem with the let ... else (mind you not the let ... when ... else form) deconstruction statement and the notion of checking for completeness in switch and match structures. I'm not convinced the set of them that can be proven during compile time is worth using. Type completeness checking seems to be a goal post on a field different than the one .NET is playing in. It may work in F# sometimes if you ignore interoperability (and enough of the BCL) and treat the program as a closed system, but in C# dealing with things like exposing a method that contains a switch statement to another assembly, the "problem" of avoiding null and proving completeness rapidly approaches intractability.

gafter commented 8 years ago

https://www.google.com/patents/US7263687

HaloFour commented 8 years ago

@gafter Well I guess there goes that idea (along with #3704). I assume that patent is Oracle's now.

MgSam commented 8 years ago

@gafter Have you guys considered it for C# and the patent is really a roadblock? I'm certain other languages besides Java have OO enums. Pretty ironic if a patent with your name on it now prevents you from doing something similar in your current role.

HaloFour commented 8 years ago

Actually it looks like Swift has gone this route for ADTs.

enum Barcode {
    case UPCA(Int, Int, Int, Int)
    case QRCode(String)
}

let productBarcode = .QRCode("ABCDEFGHIJKLMNOP")

switch productBarcode {
case .UPCA(let numberSystem, let manufacturer, let product, let check):
    print("UPC-A: \(numberSystem), \(manufacturer), \(product), \(check).")
case .QRCode(let productCode):
    print("QR code: \(productCode).")
}

I am certainly not a lawyer but I would suspect that such an implementation would differ from the quoted patent enough. I assume that Apple did not pay to license that patent.

gafter commented 8 years ago

@HaloFour I don't see how you read any part of that patent in Swift's language feature. Swift's enums do not have a closed set of values, they have a closed set of types.

HaloFour commented 8 years ago

@gafter

I don't see how you read any part of that patent in Swift's language feature.

I'm not. My assumption is that the implementation is so different as to be unrelated and therefore not infringing. But I'm not a patent lawyer nor am I terribly versed in patent law so I don't want to make too many assumptions as to how wide that Java patent could be applied. I'll defer to your expertise in that matter.

Swift's enums do not have a closed set of values, they have a closed set of types.

It seems that Swift is trying to accomplish both, or at least make it syntactically simple enough to feel like values in the simpler cases. Either way I think it accomplishes closed-ADTs in a fairly intuitive syntax.

alrz commented 8 years ago

@gafter So that is just concerning enum struct (closed set of values) not enum class (closed set of types)?

gafter commented 8 years ago

@alrz Yes, although Java's enums are actually reference types.

alrz commented 8 years ago

@gafter And that's because Java doesn't have user-defined value types yet :smile: By the way, I found the Java implementation confusing, because, as you said, it's a "closed set of values", but at the same time, you can override methods for every enum member. Then it becomes a closed set of types (or more precisely, a closed set of instances of various subclasses of the enclosing type).

This proposal clearly distinguishes these two concepts with enum struct and enum class. So you can not override methods in the former, because in that case, every enum member is solely a singleton instance of the enclosing type.

gafter commented 8 years ago

@alrz In Java, the (static) types of the members are the type of the enum.

Using enum struct to mean a closed set of values and enum class to mean a closed set of types is not clear at all. Shouldn't the features of struct vs type (on one hand) and closed set of values vs types (on the other hand) be orthogonal?

alrz commented 8 years ago

@gafter I said it's "clear" because you can not possibly have a closed set of subtypes with struct. I don't know how to restrict this to be not confusing, but I really like enum class syntax for declaring flat ADTs as they are more common, in these scenarios I think abstract sealed classes are too much.

gafter commented 8 years ago

@alrz Yes, I like the single keyword enum better than the pair of keywords abstract sealed too. However the nesting feels uncomfortable to me because of the way one has to name the members in clients (same issue as for abstract sealed).

alrz commented 8 years ago

@gafter Not just abstract sealed but also all the other noises with inner classes, I mean, look at this

public sealed abstract class Option<T> {
    public sealed class Some(T Value) : Option<T>;
    public sealed class None() : Option<T>;
}

However, I do believe that abstract sealed classes will be useful in more complicated scenarios like when the base class has a record-parameter-list or you want to declare a tree of ADTs.

But, about nesting issue, I've been thinking about this before. There are some options and pitfalls:

I don't know which direction you guys are going to take.

alrz commented 8 years ago

@gafter I just had a thought. I wanted to open a new issue but I would rather post it here since it's related to the subject.

Parts of this already proposed but I'm suggesting a unified syntax so one can get similar effect on various contexts.

It is proposed to use using to declare "strongly typed type aliases" so that it would support generics and all the goods that come with types.

public using class EmailAddress = String;
public using struct Identity = Int32;

An optional type-parameter-list would be allowed in these declarations.

I propose using(together with enum) as a modifier so one can apply it to a class or struct

using enum class Option<T> {
    Some(T Value),
    None
}

This has the same effect that AutoOpenAttribute provides for modules in F#. So the inner types of the class would be visible in the namespace level.

var some = Some<T>(value);
var none = None<T>();

This should also work for sealed abstract or any other classes as well. The thing is, that if the outer class happen to be generic together with the inner class, all the type parameters would be specified in a single type-parameter-list, delimited with a semicolon for each type. For example

public using abstract sealed class Foo<T> {
    public sealed class Bar<U>() : Foo<T>;
}

Then

var bar = Bar<T;U>();

would be equivalent to

var bar = Foo<T>.Bar<U>();

I think this would pretty much solve the problem with nesting, What do you think?

gafter commented 8 years ago

While I agree it would "pretty much solve the problem with nesting" it feels like an awful lot of new syntax and mechanism to address the problem. I think we'd have to be very careful what things are brought into scope by the implicit using.

alrz commented 8 years ago

@gafter I don't see an "implicit" using? If by "awful lot of new syntax" you mean an using modifier and the semicolon in parameter list, I disagree, but I do agree that some weird stuff going on here!

We do carelessly bring all of the types into the scope with using directives, I don't think that would be a problem, I mean, if there were an outer class with the same name in the scope, that wouldn't be an error, but the compiler would complain when you are referring to them (as it does when types in various namespaces conflict with each other because of using directives).

HaloFour commented 8 years ago

@alrz Not particularly related to these proposals but I always figured that Some and None patterns would be designed in such a way as to work with all existing reference/nullable types in C# rather than introducing a new Option<T> type. That's assuming that they'd even be necessary given type patterns.

gafter commented 8 years ago

@HaloFour Unfortunately, you can have a Some<string>(null).

In any case, this isn't just about Some/None. It is about more general Algebraic Data Types, of which Some/None is just one example.

HaloFour commented 8 years ago

@gafter

Sure, I understand that Option<T> is only serving as an example of ADTs.

I always thought that a Some of null in a functional language was one of those anomalies generally resulting from poor interop with non-functional languages or other hacky stuff. I don't think I've ever seen any Scala of F# code written to defend against Some being null. And if None/Some were in consideration for C# (and I mentioned that seems redundant given type patterns) I think I'd prefer to see them designed in such a way to be compatible with existing reference/nullable types, implemented as custom is operators.

Anyway that's all a tangent to the proposal of using enum-like syntax to describe ADTs.

orthoxerox commented 8 years ago

@gafter there are two ways to solve it.

One is to throw in the constructor.

Another is to hide the constructors and use a factory method that returns a None instead of a Some when given a null value. Or even merge two types into one, just like there's no special Nullable for nulls.

Yes, this will not work that well with pattern matching (unless you can use is to privately construct an instance of Some or None), but will work wonderfully with do-notation, er, LINQ.

alrz commented 8 years ago

@orthoxerox Nullability practically breaks the completeness of ADTs. While any of your solutions would solve the problem with Some<string>(null) I think #5032 is required to facilitate ADTs.

gafter commented 8 years ago

We are seriously considering enum class for ADTs. Along with it we would like to add additional name lookup rules (#952) for use in patterns so that you don't have to dot off the container. It would apply to normal enums too, so you could write

switch (color)
{
    case Blue: // shorthand for Color.Blue if the old name lookup rules would fail
        // etc
}

and

Option<Foo> o = ...;
switch (o)
{
    case Some(var foo): // shorthand for Option<Foo>.Some
        // etc
}

@agocke

alrz commented 8 years ago

@gafter This is just great, however I don't know if it works for generics within generics (assuming that o is an object). Doesn't it imply that type parameter list in the constructor is being inferred?

gafter commented 8 years ago

@alrz In my example o is not an object, it is statically of type Option<T>. This shortcut only works if you are switching or matching on a value of the ADT's enclosing type (or an enum type).

alrz commented 8 years ago

@gafter so, I think it wouldn't work for Bar<U> in the sealed abstract class Foo<T> that I've mentioned above.

gafter commented 8 years ago

@alrz That is correct; it would have no impact on your Foo<T>.Bar<U> example.

With what I'm proposing here, you would not normally need to specify the enclosing type's type arguments. It is taken from the value that is the argument of switch or is or whatever is being matched. If that doesn't work, then you have to specify it the hard way. For example

enum class Foo<T>
{
    Bar<U>(T t, U u);
}

    Foo<X> foo = ...;
    switch (foo)
    {
        case Bar<Y> bar: // shorthand for case Foo<X>.Bar<Y> bar
            ...
    }
alrz commented 8 years ago

@gafter That's nice, but the issue remains. You still have to use var o = Option<int>.Some(0) which is not intuitive at all. Isn't it possible to make this work also for type invocation or type creation as well, e.g. var o = Some<int>(0)?

gafter commented 8 years ago

I see the enum class here as a nice solution to #188 (as long as you don't mind its constraints, like nesting and flat hierarchy).

bbarry commented 8 years ago

(disclaimer: I don't mind those constraints and would be happy with them)

Have you considered a spec such that you don't declare these types nested:

public enum class Root();
public class Foo(Root r) : Root();
public class Bar() : Root();
...

under a rule "all types in a type hierarchy rooted by an enum class must be in the same namespace and assembly"

public enum class Root
{
    Foo(Root r),
    Bar,
    ...
}

could be shorthand for defining this type hierarchy.

A deeper hierarchy could also be defined by nesting enums

public enum class Root
{
    enum Foo(Root r) 
    {
        Baz,
        Fez
    },
    Bar,
    ...
}
gafter commented 8 years ago

@bbarry We've considered a number of possible rules for bounding the set of subtypes. The physical nesting within curly braces feels like the most proper of the options.

orthoxerox commented 8 years ago

@gafter Deep hierarchy is an important feature, at least for me.

alrz commented 8 years ago

Hope this doesn't replace abstract sealed classes. I like to see both in the language.

yaakov-h commented 8 years ago

@alrz What do you mean? You can't have an abstract sealed class, that yields CS0418, an abstract class cannot be sealed or static.

Joe4evr commented 8 years ago

@yaakov-h It's one of his other proposals, I believe to have an abstract class that can only be derived within the same assembly and the derived classes are all sealed, so the compiler will know all of the implementations at compile-time.

A neat idea in theory, but the majority of developers will probably be very confused about how the two contrary keywords will behave. I would rather see a new contextual keyword there instead (abstract closed perhaps).

alrz commented 8 years ago

@gafter This one from Swift docs looks like a closed set of values,

enum ASCIIControlCharacter: Character {
    case Tab = "\t"
    case LineFeed = "\n"
    case CarriageReturn = "\r"
}

or Kotlin:

enum class ProtocolState {
  WAITING {
    override fun signal() = TALKING
  },
  TALKING {
    override fun signal() = WAITING
  };
  abstract fun signal(): ProtocolState
}

Java:

enum ASCIIControlCharacter {
    Tab('\t'),
    LineFeed('\n'),
    CarriageReturn('\r');
    ...
}

Proposed C# syntax:

enum struct ASCIIControlCharacter(char Character) {
    Tab('\t'),
    LineFeed('\n'),
    CarriageReturn('\r');
}
nerdshark commented 8 years ago

What about having a look at Nemerle's implementations of ADTs? They are object-oriented by design; variant type declarations emit an abstract base class containing nested children, optionally with individual, specialized implementations, which inherit the parent, while enums come in C-like, C#-like, and SML/OCaml-like ADT variations (with each ADT enum member being a singleton value, possibly instantiated by a record-type implicit constructor). Here's a link to some of their documentation on the subject.

I particularly like their syntax, and the explicit discrimination between variants (discriminated unions, a closed set of contextually-related types) and enumerations (closed set of contextually-related values).

alrz commented 8 years ago

@nerdshark you mean something like #188?

nerdshark commented 8 years ago

@alrz Yeah, sure looks that way! Edit That certainly would be workable as the mechanism, and I think that, at the very least, some variant syntax sugar would be useful to tell the compiler to ensure that each variant type member is sealed, meets proper visibility requirements, etc., and would reduce the amount of work the developer has to do. I think all that would be required would be adding the variant keyword, which tells the compiler to generate the correct type hierarchy using the variant definition; it would be possible to otherwise use enumeration-like syntax (list of type/record declarations separated by commas). However, I think I'd prefer either the existing nested type declaration syntax or (similar to Nemerle and F#) a tuple/record-based declaration syntax.

nerdshark commented 8 years ago

And now I realize you've covered most of that already in #188. Maybe I should learn how to read. ¯(ツ)

alrz commented 8 years ago

@nerdshark and by the way, I didn't proposed #188.

alrz commented 8 years ago

Actually, enum classes should be able to extend other classes, like

public enum class FooException : Exception {
    Reason1,
    Reason2(int value);
}

In this case, #6789 can be used to deconstruct this type.

gafter commented 8 years ago

It is possible that the patent does not read on this proposal. The enum class does not have a fixed number of constant values, as in the patent. The enum struct is not class-based, as in the patent.

alrz commented 8 years ago

@gafter If at least existing enums could be considered as complete patterns I wouldn't miss enum struct that much.

gafter commented 8 years ago

@alrz they cannot be considered complete. See, for example, [FlagsAttribute].

HaloFour commented 8 years ago

Not to mention you can explicitly cast any compatible integral value to the enum type and neither the compiler nor the CLR will do anything to stop you. Kind of unfortunate.

alrz commented 8 years ago

Considering that existing structs have some restrictions due to their direct memory layout, can we consider enum struct just as well? For example,

enum struct Option<T> { None, Some(T) }

Could lead to a zero-overhead memory layout for a safe optional T (as long as it's non-nullable).

orthoxerox commented 8 years ago

@alrz could you explain how your proposed enum struct would work? Would there be a hidden tag field, like in F# ADTs?

alrz commented 8 years ago

@orthoxerox Yes, however, for the special case of Option<T> I believe there is no need for that because it's either all empty or Some. F# ADTs translate to a class hierarchy but also have tag fields for performance. structs on the other hand, cannot have sub-types but I think enum struct can be translated to a special struct layout similar to unions (using compiler generated aligned fields).

For example,

enum struct S { A(int), B(int) } 
enum struct S2 { C1, C2 }
var s = S.A(0);
var r = s switch(case A(var a): ... , case B(var b): ... );

// would roughly translate to

[StructLayout(LayoutKind.Explicit)]
struct S
{
  [FieldOffset(0)] internal int _a0;
  [FieldOffset(0)] internal int _b0;
  [FieldOffset(4)] internal int _tag;
}
struct S2 { internal int _tag; }

var s = new S { _a0 = 0, _tag = 0 };
var r = s._tag switch(case 0: (var a = s._a0;  ... ) , case 1: (var b = s._b0;  ... ));

Because A and B are not concrete types, some restrictions compared to enum class would be applied.