EnumType::values() - Githubissues

Crell commented 4 years ago

Do we auto-generate an EnumType::values() method when the enum has only Unit Cases?

Pro: It's a common-enough use case. It's probably not too hard to do.

Con: Possibly interesting error handling when an Enum Type has Associable Cases, since then we can't have the method. Or the method has to throw an exception. Or something. Is the order locked at Lexical order or do we leave it undefined? What exactly is returned, strings or objects?

Discuss.

bwoebi commented 4 years ago

Order should be definition order. And return the objects, just like new ReflectionClass(EnumType::class)->getConstants() would do.

We may consider leaving the values() function off if the object has at least one associate case. (And I think, in that case it's also much less useful)

We may want to add an additional function returning the names of the classes of the enum, so that a code inspection tool could possibly say "cases MyEnum::A and MyEnum::C were not handled" for example.

iluuu1994 commented 4 years ago

We may consider leaving the values() function off if the object has at least one associate case. (And I think, in that case it's also much less useful)

I think that makes most sense. Enumerating is only really useful for finite enums. When any case has an associated value they're not really finite anymore.

iluuu1994 commented 4 years ago

We may want to add an additional function returning the names of the classes of the enum, so that a code inspection tool could possibly say "cases MyEnum::A and MyEnum::C were not handled" for example.

Would that actually be necessary? Psalm can already do it for constants: https://psalm.dev/r/82bcedc134

bwoebi commented 4 years ago

I was really more thinking about runtime messages than about static analysis. Was just an idea though, not sure how necessary it is...

iluuu1994 commented 4 years ago

We may consider leaving the values() function off if the object has at least one associate case. (And I think, in that case it's also much less useful)

Side note: This is also what Swift does, seems to work well for them.

I was really more thinking about runtime messages than about static analysis. Was just an idea though, not sure how necessary it is...

In terms of reflection we could potentially also create a new ReflectionEnum class where you can inspect the given cases in which case we could merge unit and associated cases. That could make more sense for end users.

bwoebi commented 4 years ago

Yeah, I guess listing the class names is best part of reflection. Shouldn't really be a method on the enum classes.

bwoebi commented 4 years ago

I think we should not have that under future scope, because otherwise people are going to implement their own static function values() and adding it in a later version would be quite the BC break.

iluuu1994 commented 4 years ago

Thinking about it, would values() be redundant if we have proper reflection? values() would be a lot more convenient for sure.

Crell commented 4 years ago

The cases where you'd want to use values() are cases where your brain is not really in reflection land. If we think people are going to want it, let's do it right.

So it sounds like:

If an Enum Type has only Unit cases, generate Type::values() which returns an array of singleton Case objects, in lexical order.
If there is at least one Associable Case, do not generate values() or have it throw an error if used. (I think the latter is better DX, but if the former is easier, meh.)
Add a ReflectionEnum class, which would have getCases(), which returns... an array of strings, Clubs, Hearts, etc?

bwoebi commented 4 years ago

I think not generating it would be better, so that people who want to can define their own static function values() then, not just because "easier".

iluuu1994 commented 4 years ago

A more Rust approach could be adding an attribute to synthesize these methods explicitely?

// Internal interface
interface EnumCaseInspector { // Yes, the name is bad
    public static function getEnumCases(): iterable;
}

#[Synthesize(EnumCaseInspector::class)]
enum Foo {
    case Bar;
}

bwoebi commented 4 years ago

I do not see a particular gain in not just always providing values() if no associable cases - why would you do that explicitly?

iluuu1994 commented 4 years ago

@bwoebi

I think not generating it would be better

I do not see a particular gain in not just always providing values() if no associable cases

Well, now I'm confused ^^

bwoebi commented 4 years ago

I meant in this case, not generating it was better, quoting Larrys post above:

If there is at least one Associable Case, do not generate values() or have it throw an error if used. (I think the latter is better DX, but if the former is easier, meh.)

But for the case there are no associables cases, then it should be generated.

Sorry for confusion :-)

frankdejonge commented 3 years ago

@Crell since there are units and non-units, could an EnumType::unitValues() be an option? In this case only the unit cases would be returned (an empty array if there are non). This explicit distinction could make it clearer what is and more importantly what is not returned.

iluuu1994 commented 3 years ago

@frankdejonge That would work but I'm not sure it's valuable or even desirable to iterate over a subset of an infinite set. But we should probably make it possible to check if an enum class is finite though (like implementing an marker interface that actually defines the values() method).

Crell commented 3 years ago

Broadly speaking, there are two categories of enum use cases:

A finite list of unit values. Sometimes it makes sense to back these with a primitive, sometimes not. They will often, but not always, need to be serialized/deserialized to/from the database or request. Getting a list of them makes sense.
One or more associable values. This means the list of possible values is generally infinite (unless it's associated only with a finite enumerable value, but that's not worth trying to detect for). Associable values don't really have their own logical serialized format, or if they do it is likely associated-data-dependent. Getting a list of all of them is basically impossible.

Just about every language we surveyed that supports both features folds them into the same enum language structure. But... is that correct? I really hate to suggest it, but should we really be looking at two separate language structures? One that's "glorified constants" (with associated functionality like a EnumType::from() method) and one that's for ADTs/simplified sealed classes?

I don't know if I actually like that idea, and it feels like we're missing something if everyone else manages to make it a single structure, but it's worth discussing, I suppose.

frankdejonge commented 3 years ago

I'd be in favor of exploring this. When I look at the doctrine that can be applied to enums as finite options, they are impossible to apply on associable values. Enums with only predefined values can benefit from a lot of assistive tooling such as the before mentioned EnumType::from(), the EnumType::values() method. In addition this, I believe the introduction of the concept will have a higher chance of landing in the language. The more elaborate construct will have trade-offs on both sides (lack of assistive methods for either case), that may cause people who favor either side over the other to find parts they disagree with.

frankdejonge commented 3 years ago

Is there a standalone concept in an other language that provides the functionality that a (as defined in the current proposal) associable value would bring to PHP?

iluuu1994 commented 3 years ago

but should we really be looking at two separate language structures? One that's "glorified constants" (with associated functionality like a EnumType::from() method) and one that's for ADTs/simplified sealed classes?

Rowan has mentioned this before (actually, he was always of the opinion that it should be two separate features). I don't really agree. An enum would only contain cases with no associated values, whereas ADTs would contain at least one case with associated values. If you have an enum with 10 cases and you want to add one case with associated values to it you'll need to convert it to an ADT, which seems like an arbitrary distinction to me. ADTs can do everything an enum can, I don't see what we gain from creating two separate features with a large overlap, given that it will just increase the complexity of the language.

Crell commented 3 years ago

The big difference is in the primitive equivalent. If we have no primitive equivalents (as originally planned), then yes, ADTs are a strict superset of enums and implementing them in the same coherent data structure makes complete sense. But if unit values have primitive equivalents and associated values do not, does that make the single data structure too complicated? Either from a syntax POV or an implementation POV?

(I'm not sure myself; hence why I'm asking.)

iluuu1994 commented 3 years ago

But if unit values have primitive equivalents and associated values do not, does that make the single data structure too complicated?

Enums don't inherently all have primitive equivalents. Even if they do we might want to avoid implicit coercion and require explicit method calls to convert enum values to their primitive counterparts. If we wanted to language support for primitives that could look like this:

enum Foo {
    case Bar = 0;
    case Baz = 'baz';
    case Qux(int $quux);
    case Invalid(int $quux) = 42; // Compile error: Cases with associated values can't be backed by a primitive
}

We might also want to inhibit mixing primitive backed cases and non-primitive backed cases altogether to avoid confusion. This is pretty much what Swift does:

enum Foo {
    case bar = 0 // error: enum case cannot have a raw value if the enum does not have a raw type
    case baz(qux: Int)
}

enum Foo: Int {
    case bar = 0
    case baz(qux: Int) // error: enum with raw type cannot have cases with argument
}

I think these two cases are similar enough to where we're better off not building two completely distinct language features.

Crell commented 3 years ago

The big overlap for me would be methods; There's use cases for both unit and associated members to have methods, and doing that twice would be fugly. In fact... I can see a case for unit cases to have a primitive equivalent and methods. That syntax gets maybe weird, unless done via annotations.

So the question is, would doing what Swift does (forbidding primitivized and associated members in the same enum) be sufficiently understandable and implementable? And if a unit value in a unit-only enum doesn't have a primitive equivalent, do we auto-generate one or no? (Probably its name as a string.)

I know some people had talked about needing different string representations in different contexts, but I am very happy to say "that's what methods are for, go define one and leave us alone" on that one.

iluuu1994 commented 3 years ago

I think this is fairly clear now.

Crell / enum-comparison

EnumType::values() #11