dotnet / csharplang

The official repo for the design of the C# programming language
11.4k stars 1.02k forks source link

Proposal: "Closed" enum types #3179

Open gafter opened 4 years ago

gafter commented 4 years ago

Related to https://github.com/dotnet/csharplang/issues/485, it would be helpful to be able to declare an enumerated type that is known to the language to be closed. The syntax is an open question, but as a first proposal I'll suggest using closed as in https://github.com/dotnet/csharplang/issues/485.

For example, a closed enum

closed enum Bool { False, True }

Could be used like this

int M(Bool b)
{
    switch (b)
    {
        case Bool.False: return 0;
        case Bool.True: return 1;
    }
}

This code would be considered OK by flow analysis, as the method returns for every possible value of the parameter b. Similarly a switch expression that handles every possible declared enumeration value of its switch expression of a closed enum type is considered exhaustive.

To ensure safety:

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions

0xd4d commented 4 years ago

If you add [Flags] to it, is that an error?

There must be a way to convert an integer to the closed enum (unsafe code).

gafter commented 4 years ago

I imagine you could add [Flags] to it, but it wouldn't have any effect other than possibly generating a warning. That attribute has no language meaning.

alrz commented 4 years ago

Since enum class is going to be the syntax for discriminated-unions, I vote enum struct for this. When there is no sub-member, it is practically a closed enum,

// if you can do this
enum struct Option<T> { Some(T), None }
// there's nothing to stop you from doing this
enum struct Bool { False, True }

I think this is more of a lowering strategy question - whether or not this should be emitted as a proper enum.

If we go with proper enums, we can't manage external values, so I think we will need to emit it as a struct anyways.

(some examples from F#)

DavidArno commented 4 years ago

@0xd4d,

There must be a way to convert an integer to the closed enum (unsafe code).

Why? I see no reason for closed enums to have any sort of fixed underlying value. If you want to convert, then use a switch expression:

Bool b = i switch {
    0 => False,
    _ => True
};

@alrz, enum struct Option<T> { Some(T), None } is sort of how I envisioned DU's looking (and so agree that enum struct Bool { False, True } is a good choice for closed enums.

However, isn't there still an issue pattern matching on those as I'd imagined DU pattern matching would be type based, ie I'd write something like:

var x = option switch {
    Some v => v,
    None => default
};

But with enum struct Option<T> { Some(T), None }, Some isn't a subtype of Option.

I can invent some syntax here:

enum struct ShapeName { Point, Rectangle, Circle }

enum struct Shape 
{
    Point,  
    Rectangle(double Width, double Length),
    Circle(double Radius)
}

and I'm happy that this is consistent: a closed enum is a DU where the values have no parameters.

And I can imagine how I'd pattern match this stuff:

var area = shape switch {
    Point => 0,
    Rectangle { Width: var w, Length: var l } => w * l,
    Circle { Radius: var r } => Math.PI * r * r
};

But, I've no idea how this lot gets lowered by the compiler. For a struct, Shape, that contains a Circle "value", what does shape is Circle { Radius: var r } mean?

dsaf commented 4 years ago

Does it mean that existing enums will be updated to be somehow open? Otherwise the terminology is confusing.

DavidArno commented 4 years ago

@dsaf, existing enums are already open. For:

enum Shape { Point, Rectangle, Shape }

Shape can take any int value. Point, Rectangle, and Shape are little more than int constants.

But it would indeed be prudent to talk of them as being open enums to clarify the difference between existing enums and the closed variety.

alrz commented 4 years ago

@DavidArno

isn't there still an issue pattern matching on those as I'd imagined DU pattern matching would be type based

No they wouldn't be types (or members), if it's a struct DU, all parameters get emitted as flat fields into the struct, the rest is compiler magic. See what F# produces for your example as a value DU.

Though I can imagine C# could do a better job utilizing runtime support for FieldOffsetAttribute if possible (e.g. if all parameters are blittable) - minimizing the struct size.

0xd4d commented 4 years ago

@0xd4d,

There must be a way to convert an integer to the closed enum (unsafe code).

Why? I see no reason for closed enums to have any sort of fixed underlying value. If you want to convert, then use a switch expression:

Bool b = i switch {
    0 => False,
    _ => True
};

That's inefficient unless it's a trivial enum like the above Bool enum. Deserialization should be quick.

It should be as simple as it is today, except the programmer must verify that it's a valid value (or it's undefined behavior)

int value = 1;
if (InvalidValue(value)) throw ...;
unsafe { return (MyEnum)value; }
alrz commented 4 years ago

@0xd4d You can add it yourself, or have a generator to do it.

enum struct Bool { 
  False, True;
  public static explicit operator Bool(int v)
    => v switch { 0 => False, 1 => True, _ => throw InvalidCastEx };
}

I think unlike enums the corresponding "integer tag" is solely an implementation detail for DUs, so I don't think it's a good idea for the compiler to generate that out of the box.

0xd4d commented 4 years ago

@alrz Your code is the same as @DavidArno's code, a big switch statement which generates a lot of code and causes extra CPU usage at runtime. My solution to use a cast in an unsafe block has the same perf and code size as code we can already use today.

DavidArno commented 4 years ago

@alrz, "compiler magic" suits me. It's then @gafter's job to solve how to have struct-based DUs (including closed enums) work with a is X b, rather than mine 👍

@0xd4d, it's very unlikely that the solution to serialization of closed enums and other DUs is to convert them to ints. I guess it all depends on whether closed enums are implemented as a completely different solution to DUs or not. Doing so would be a mistake in my view. So I'd argue that any solution to serializing

enum struct Bool { False, True }

also has to be able to serialize

enum struct Option<T> { Some(T), None }
alrz commented 4 years ago

The only difference with the original proposal would be that False and True in that example wouldn't be compile-time constants so we can't use enum struct and enum interchangeably.

That is because an enum type as it exists today, is just a group of named constants with zero safety guarantees, I'm not sure if it makes sense to try to "fix" that or take the trade-off and favor struct DUs.

333fred commented 4 years ago

why not closed by default, and open as suggested syntax to open them?

Because open is the default today, and we can't change that.

AartBluestoke commented 4 years ago

@gafter would your intent to be that adding a new value to a "closed" enum is a breaking change? would this be binary compat breaking change, or just a "switch statement which must pick a value entered unknown territory by not finding a match" exceptional territory. eg: what if you wanted to add trinary logic "not provided by customer" to the bool enum (bad naming example)

DanFTRX commented 4 years ago

@AartBluestoke I believe its true, false and filenotfound.

gafter commented 4 years ago

@AartBluestoke I imagine that adding a new value to a closed enum would be a binary breaking change. Just like b switch { false => 0, true => 1 } is translated to b == false ? 0 : 1, when b is a bool, I would expect b switch { Bool.False => 0, Bool.True => 1 } to be translated to b == Bool.False ? 0 : 1 when b is a Bool. But this design point could be debated and changed.

popcatalin81 commented 4 years ago

I imagine that adding a new value to a closed enum would be a binary breaking change.

How would this be enforced without the C# compiler generating an under the hood default case? And also the user might add a default case himself in which case the compiler must generate code to check the range of values expected have not been modified.

Something like: In library A

public closed enum ObjectAccess
{
     Read,
     ReadWrite,
     Write
}

Consumer of Library A code in assembly B:


switch objectAccess
{
    case ObjectAccess.Read:
           return new ReadOnlyFooBar(...);
    default :
           return new ReadWriteFooBar(...);
}

I asume the C# generated default case would actually look like:

switch objectAccess
{
    case ObjectAccess.Read:
           return new ReadOnlyFooBar(...);
    default :
    {
          if (objectAccess != ObjectAccess.ReadWrite && objectAccess != ObjectAccess.Read)
                throw new InvalidOperationException($"Unexpected value: {objectAccess:g}");
          return new ReadWriteFooBar(...);
    }
}

When library Author adds a new value the the closed enum:

public closed enum ObjectAccess
{
     Read,
     ReadWrite,
     Write,
     ReadWriteExclusive
}

This will indeed break binary compatibility, however the problem is now that recompiling B will catch the new value in the default clause without the library author having the chance to have to treat the new value explicitly.

So this feature can only work if the default clause is not allowed for closed enums, which I think would make the feature impractical for any enums with more than a few members.

canton7 commented 4 years ago

@popcatalin81

Other languages seem to get by just fine. In e.g. Rust, the default case doesn't add any extra checks: if there's a default case, then that will catch all other unhandled values, now and in the future.

Currently, if I have a switch that I want to be exhaustive, I need to have an explicit case for every possible enum member and a default which throws an exception. The downside is that I don't get a compiler warning/error if a new enum member is added, only a runtime exception.

With closed enums, I still need to have a case for each enum member, but I can avoid having a default at all, and I can get a compiler warning/error if someone adds a new enum member.

AartBluestoke commented 4 years ago

I believe the following spec would match the desired behavior:

The compiler will enforce that: Switching on a closed enum must be exhaustive, A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception. If a default case is not specified the compiler will auto-generate a default path that throws an exception if accessed

No other compiler or runtime changes are needed.

HaloFour commented 4 years ago

@AartBluestoke

A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception.

That seems unnecessary. What if the closed enum has 20 values but I only care about 2 of them? Why can't I have a default that returns some value instead?

gafter commented 4 years ago

The intent was not to require that switches be exhaustive. We'd still permit a switch expression to be non-exhaustive, with a warning. And I don't think we'd add any requirements for switch statements, other than the fact that we can infer that a default case isn't needed (e.g. for definite assignment) if you have cases for every enumeration constant.

AartBluestoke commented 4 years ago

A switch on a closed enum must either not have a default case, or have a default case that guarantees an exception.

That seems unnecessary. What if the closed enum has 20 values but I only care about 2 of them? Why can't I have a default that returns some value instead?

if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this. If that is true then the enum shouldn't be closed. If it is true, you should handle the remaining values.

Perhaps the contextual keyword "default" becomes "and the rest of the values" within the compiler, with the compiler generating a true default branch for exceptions for if non-standard values get into the enum somehow?

My thoughts when i was writing that was if you add a new value to the closed enum, it has to explode, otherwise, "closing" has no meaning other than telling roslyn "don't generate a warning for this one specific situation", and preventing implicit cast from enum to int. and that doesn't seem like a necessary language feature; feels like a job of an analyzer.

YairHalberstadt commented 4 years ago

if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this. If that is true then the enum shouldn't be closed. If it is true, you should handle the remaining values.

This is really common in functional languages, and the reason is simple: in some cases you don't care about all the values, and other cases you do.

For example you might have a number of states a method can return. In some cases you want to deal with them all seperately. In others you only care if it was successful or not.

DavidArno commented 4 years ago

... if you are only using 2 out of 20 values, then you effectively have open use of a closed enum; you're already saying i don't care if new values get added to this ...

Sorry, @AartBluestoke, but that is so not true. If, within a particular context, I explicitly call out two of the twenty values, then that's because - in that context - I'm only interested in those two values. So I'll have a default handler for the rest. But in a situation where I explicitly list all of them (which is likely to occur elsewhere in the code) then I absolutely want a compiler error is a new value is added. And having it be a binary breaking change at that point too would be the icing on the cake.

popcatalin81 commented 4 years ago

So a default clause for a closed enum will mean:

No default clause:

It sounds about right for me as a library consumer. There's an escape hatch I can use.

However, I would still like to opt-out of binary breaking changes, if those are added. It may be important to make certain libraries forward compatible with closed enums through some mechanism or another. Example:

public closed enum Choice
{
        First,
        Second,
        Third
}

string AnalyzeChoice(Choice choice)
{
       switch choice
      {
            case Choice.First:
                  return "Good";
            case Choice.Second:
                  return "Average";
            case Choice.Third:
                  return "Bad";
            missing:
                return  "Unknown"; // Only when value not part of the enum is passed (like a version of the Enum in a future assembly). 
                //Exhaustiveness checking is still applied to all cases. Recompilation will give an error.
      }
}
declard commented 4 years ago

@DavidArno

And I can imagine how I'd pattern match this stuff:

var area = shape switch {
    Point => 0,
    Rectangle { Width: var w, Length: var l } => w * l,
    Circle { Radius: var r } => Math.PI * r * r
};

But, I've no idea how this lot gets lowered by the compiler. For a struct, Shape, that contains a Circle "value", what does shape is Circle { Radius: var r } mean?

This may do the job:

[StructLayout(LayoutKind.Explicit)]
public struct Shape
{
  public struct Point {}
  public struct Rectangle { public double Width; public double Length; }
  public struct Circle { public double Radius; }
  private closed enum ShapeType : short { Point, Rectangle, Circle }
  [FieldOffset(0)]
  private ShapeType shapeType;
  [FieldOffset(2)]
  private Point point;
  [FieldOffset(2)]
  private Rectangle rectangle;
  [FieldOffset(2)]
  private Circle circle;
  public bool Deconstruct(out Point p)
  {
    if (shapeType == ShapeType.Point) { p = point; return true; }
    else { p = default; return false; }
  }
.....
}

However, it won't work that easily with reference types inside

john-h-k commented 4 years ago

@alrz @DavidArno enum struct could conflict with value type discriminated unions, if they become a thing. Would be quite unintuitive to have enum class and enum struct be unrelated

DavidArno commented 4 years ago

@john-h-k,

enum struct could conflict with value type discriminated unions

If we view closed enums as just another type of DU (where the cases are all parameterless) then there is no conflict. Closed enums are just a subset of DUs.

KalleOlaviNiemitalo commented 4 years ago

Closed enums should require a name for the zero value, because one can easily acquire that value without unsafe code.

class C
{
    private closed enum Number { One = 1 }

    private Number? number;

    public override string ToString()
    {
        switch (this.number.GetValueOrDefault())
        {
            case Number.One: return "one";
        }
    }
}

Perhaps closed enums don't need to support '=' constant_expression in enum_member_declaration at all.

ericsampson commented 3 years ago

How would people envision these closed enums getting serialized to JSON (say by STJ) ? My first gut reaction would be as string versions of the declared names, with options on how to customize the transformation (lower, upper, camel, etc)

CyrusNajmabadi commented 3 years ago

How would people envision these closed enums getting serialized to JSON

That would be out of scope from the language. It would depend on whatever serialization library you were using.

ericsampson commented 3 years ago

Sure, yeah. I would just going off this original statement, which I hope will hold.

There is no conversion from some underlying type to the type Bool (or only in unsafe code) For example, there is no operator Bool +(Bool, int). Alternately, we might consider them defined but only usable in unsafe code.

RikkiGibson commented 3 years ago

That would be out of scope from the language. It would depend on whatever serialization library you were using.

I think the term "out of scope" is easy to misunderstand here. It seems to me that the language has a responsibility to make a conscious decision about how the feature design affects things like serialization and interop with other .NET languages. For example, if the design is likely to introduce a failure mode in existing serializers where values outside the "closed enum" domain can be produced, there should be a conscious decision about whether that is acceptable--maybe it is acceptable, maybe it's not.

ericsampson commented 3 years ago

@RikkiGibson , that's what I was trying to get at in mentioning this question in the original proposal when discussing serialization and interop: "There is no conversion from some underlying type, or arithmetic operators" (or only in unsafe code).

AartBluestoke commented 3 years ago

"There is no conversion from some underlying type, or arithmetic operators" doesn't mean that serializers won't round-trip them via their underlying type. In the same way, right now, there is nothing stopping people who serialize enums do so to and from a string. That what it means to be "out of scope" : the language won't say 'the correct json serialization of this c# type is as follows'

CyrusNajmabadi commented 3 years ago

It seems to me that the language has a responsibility to make a conscious decision about how the feature design affects things like serialization

I honestly disagree. Serialization is an entirely unrelated topic to me. There are likely to be dozens (if not hundreds) of varying serialization libraries. The choices they make are going to be all over hte map here and often extremely domain specified. I have no particular expectation or desire for their to be any connection between the concepts of language types and the serialization domains.

That what it means to be "out of scope" : the language won't say 'the correct json serialization of this c# type is as follows'

Yup, that encapsulates how i feel.

ericsampson commented 3 years ago

forget serializers - maybe the better question is 'will it be possible to access the underlying type outside of an unsafe context'.

TahirAhmadov commented 2 years ago

I'm a little confused about the adding options scenario with multiple assemblies:

// assembly A, v. 1
closed enum Foo { A, B, C, }

// assembly B, v.1; references A v. 1
int Bar(Foo foo)
{
  switch(foo)
  {
    case Foo.A: return 1;
    case Foo.B: return 2;
    case Foo.C: return 3;
    // no need for default because we accounted for all possibilities known to us now
  }
}

// assembly A, v. 1.1
closed enum Foo { A, B, C, D, }

// assembly C; references A v. 1.1 and B v. 1
Bar(Foo.D); // what happens in this case?
TahirAhmadov commented 2 years ago

Also, another aspect of this that I couldn't find a clear answer for in this thread: how are these proposed to be stored? Will there be a (usually integer) backing value (leaving aside the restrictions on casting etc.)? Or is this a completely different mechanism?

hez2010 commented 2 years ago

I think this can be covered by tagged unions (discriminated unions).

Eli-Black-Work commented 2 years ago

Closed enums should require a name for the zero value, because one can easily acquire that value without unsafe code.

class C
{
    private closed enum Number { One = 1 }

    private Number? number;

    public override string ToString()
    {
        switch (this.number.GetValueOrDefault())
        {
            case Number.One: return "one";
        }
    }
}

Perhaps closed enums don't need to support '=' constant_expression in enum_member_declaration at all.

I'd like to second this. My understanding is that if I declare an enum as a class field, it is automatically initialized to 0, whether or not that is a valid enum value, so closed enums should be forced to always have a value that's equal to 0:

// Should error, because the closed enum doesn't have a value that's equal to 0
closed enum UserType {
  Admin = 1,
  Normal = 2
};

class User {
    // Automatically initialized to 0
    private UserType _type;
}
wojciechsura commented 2 years ago

Since number conversions seems to be a problem for the closed enum, maybe it's worth considering this matter from the other perspective? I think that it may cause less issues to be resolved, while having identical effect of solving the mentioned problem.

6164

Nihlus commented 2 years ago

Just to pipe up about syntax, what's wrong with sealed enum instead of closed enum or enum struct? Saves either adding a new keyword or using very unintuitive keyword combinations.

TahirAhmadov commented 2 years ago

Just to pipe up about syntax, what's wrong with sealed enum instead of closed enum or enum struct? Saves either adding a new keyword or using very unintuitive keyword combinations.

sealed means "cannot be inherited", and in addition to being confusing because of the different meanings, it can also create a problem in the future if "enum inheritance" is introduced.

wojciechsura commented 2 years ago

I think I have bigger problem with closed enum's behavior more than the syntax itself. The sole purpose of introducing the closed enum seems to be to influence switch statement behavior, so why don't we work on switch, but instead on enum?

For once, strict switch allows for more flexibility, because it can guard literally anything, what is enumerable and finite (namely bool, bool?, all int and int? types and also enums). Also, sealed enum looks a little bit weird for me, because the place where it is declared and where it actually takes effect are different. You'd need to lookup the type to know what the behavior in the code is, whereas in case of strict switch it is obvious at first glance. Finally, you might want to be strict in switch sometimes and sometimes not. With strict switch notation you can do that freely, while with sealed enum you are forced to be strict every time.

svick commented 2 years ago

@wojciechsura

Because the switch expression is effectively already strict. For example, consider these two switch expressions:

integer switch {
    < 0 => -1,
    > 0 => 1
}
nullableBool switch {
    true => 1,
    false => 0
}

They produce these warnings:

warning CS8509: The switch expression does not handle all possible values of its input type (it is not exhaustive). For example, the pattern '0' is not covered. warning CS8655: The switch expression does not handle some null inputs (it is not exhaustive). For example, the pattern 'null' is not covered.

(Note that you need to enable nullable reference types to get the warning about a nullable value type, which is weird, but by design per https://github.com/dotnet/roslyn/issues/52714.)

The problem only happens with enums, because the language disagrees with most of their usages regarding which values are valid.

TahirAhmadov commented 2 years ago

I forgot about this issue and started a new duplicate discussion despite commenting here previously :) Can we please get this at some point? Not as some DU (like some have suggested), but as a good old enum, just with a runtime check for "closeness".

closed enum HorizontalAlignment
{
  Left = 1, Center = 2, Right = 3
}
var x = (HorizontalAlignment)0; // runtime error
// the above line is compiled to:
var x = Enum.ConvertToClosedEnum<HorizontalAlignment>(0); // this helper method would check and throw, if needed
x = (HorizontalAlignment)1; // OK

[Flags] can be supported by creating a pool of all possible values by bit-wise OR - it's just a set of numbers and not much different from the options' numbers themselves. Implementing this can be done in a few ways. One approach is to generate the set of possible values and the logic at runtime using IL emitting. Another would be generating a hidden static method at compile time and linking to said method with a hidden attribute on the enum type. There may be other hybrid approaches, as well.

I would also be OK with a blanket prohibition on converting backing values to closed enum type at compile time (as long as I can do it in reflection/IL emit). Thank you!

CyrusNajmabadi commented 2 years ago

Can we please get this at some point?

@TahirAhmadov THe issue is a championed and is something the LDM is discussing. There is no need to repeatedly ask :)

KalleOlaviNiemitalo commented 2 years ago

@TahirAhmadov, would you expect an InvalidCastException here:

closed enum HorizontalAlignment
{
  Left = 1, Center = 2, Right = 3
}

class C
{
  HorizontalAlignment M()
  {
    int[] ints = { 1, 2, 3, 4 };
    var alignments = (HorizontalAlignment[])(object)ints;
    return alignments[3];
  }
}

If the enum is not closed, then the CLR allows this cast at run time, even though the C# standard doesn't. If this must throw with a closed enum, then I think it can be implemented in several ways:

TahirAhmadov commented 2 years ago

@KalleOlaviNiemitalo Yes, I would expect an exception to be raised there. Minor point - it should be a new exception type, like InvalidEnumValueException. My original thought was an attribute like the one you mentioned - [ClosedEnum] - but now that you listed other approaches, I kind of like the ClosedEnum type solution more (if it's possible). It may even be that ClosedEnum does not inherit from Enum, or that Enum is changed to inherit from ClosedEnum; I can't say for sure what will work best. I don't think "transpiling the cast" approach is feasible - what if generics are used and T is HorizontalAlignment[]? PS. It would be cool if we could have a static abstract method on ClosedEnum type and it's then implemented by concrete closed enum types.