[How] Should range casts between enum types work?

bradcray commented 1 year ago

This issue asks how casts between enum and int ranges, or two distinct enum ranges, should work, or if they should be supported at all.

Background

enums can be thought of as having two (potential) associated integer values:
- their ordinal values within the enum, 0, 1, 2, 3, ...
- their associated integer numerical values, if provided by a user
For example, given the enums:
```
enum color {red, green, blue}
enum size {small=0, medium, large}
enum shape {triangle=300, square=400, pentagon=500}
```
- green only has its ordinal value of 1
- size has both the ordinal value of 1 and the numerical value of 1
- square has both the ordinal value of 1 and the numerical value of 400
ranges of enums are defined in terms of their ordinal values

For example, each of the ranges red..blue, small..large, and triangle..pentagon describe three indices
casts between ranges of ints are essentially computed by casting the bounds of the range to the new integer type.

For example, (1..10): range(uint) is the same as 1:uint..10:uint

Current Behavior

Currently, casting between int and enum ranges is sometimes done in terms of the numerical values and sometimes the ordinal values, depending on which way we're going. Specifically, when casting from enum ranges to int ranges, we get behavior similar to the int range casts:

(red..blue): range(int) is similar to red:int..blue:int, resulting in a compile-time error because colors can't be cast to integers
(small..large): range(int) is similar to small:int..large:int, resulting in 0..2
(triangle..pentagon): range(int) is similar to triangle:int..pentagon:int, resulting in 300..500

Meanwhile, when casting from int ranges to enum ranges, we seem to take the ordinal interpretation:

(300..500): range(shape) generates an execution-time bounds-style error rather than giving us triangle..pentagon back again
(0..2): range(shape) gives us triangle..pentagon
(0..2): range(size) gives us small..large
(0..2): range(color) gives us red..blue

Observations

The current behavior is definitely inconsistent, as shown by the triangle..pentagon and 300..500 ranges not casting back and forth between one another as you'd expect
The current behavior is also arguably a bit questionable, in the sense of "Does it make sense for the 3-index range, triangle..pentagon to turn into the 201-index range 300..500?" Maybe having it turn into the 3-index ordinal range 0..2 would make more sense?

Questions

Should we support casts between ranges of ints and ranges of enums (in either direction)? If so, which behavior should we adopt? (ordinal or numerical values)
Should we support casts between ranges of one enum type and ranges of another enum type? How should that behave?

bradcray commented 1 year ago

The current situation here is so squirrely and difficult to defend, I'm inclined to mark casts between ranges of enums and integers as being deprecated for now. That would also give us time to wrestle with this more after 2.0 if desired.

bradcray commented 1 year ago

Also, note that casts between ranges of bool indices are somewhat related to this issue, though also potentially somewhat simpler given that bools are more interchangeable with integers and always/only have one associated value.

bradcray commented 1 year ago

PR #22417 improves upon this behavior in the OP:

(300..500): range(shape) generates an execution-time bounds-style error rather than giving us triangle..pentagon back again

by having it gives triangle..pentagon back again. This at least makes the behavior of casts more consistent, though I think the question of whether it's reasonable for a 3-element range to become a 201-element range is still a very important one.

vasslitvinov commented 1 year ago

One way to resolve this is to disable : casts between ints and enums. Instead, introduce methods or functions for the conversions ordinal ↔ enum constant and numerical value ↔ enum constant. This way it will be always clear what conversion is being made.

As a corollary, we will disable : casts involving ranges over enums and introduce a parallel set of methods/functions that convert ranges.

bradcray commented 1 year ago

One way to resolve this is to disable : casts between ints and enums.

That doesn't seem attractive to me. Nobody has expressed any concern or reservations about that feature as it exists (and conversely, users have argued for the current behavior), and the concerns I have here are specific to ranges and the fact that there are two endpoints / a given size.

lydia-duncan commented 1 year ago

I do find it concerning that casting a range of enums to a range of ints changes the number of values covered by the range. It seems like if there's some pattern to the values covered by the enum constants, we could make the range strided using that pattern and that it would be reasonable to error otherwise. E.g.

enum regular { two = 2, four = 4, six = 6, eight = 8};
var r1 = regular.two..regular.eight;
var r2 = r1: range(int);  // This would be okay and result in `2..8 by 2`
enum primes { one = 1, two = 2, three = 3, five = 5};
var r3 = primes.one..primes.five;
var r4 = r3: range(int);  // This would fail because the numbers covered is not regular

A check along those lines may be burdensome, but since I would expect enums to not have a large number of values, I suspect it probably won't be too burdensome to validate?

bradcray commented 1 year ago

It seems like if there's some pattern to the values covered by the enum constants, we could make the range strided using that pattern and that it would be reasonable to error otherwise.

I don't agree with this because, generally speaking, a range of enums is only governed by their ordinal values, not their numerical values. For example, if the enum was:

enum color { red=5, green=3, blue=1};

then the range red..blue would represent red, green, blue rather than a degenerate/empty range (as 5..1 would). Essentially, the enum's numerical values don't have any bearing on how the range is interpreted, and are only there if someone wants to do explicit conversions from the enums to integers. So, for me, the only reasonable options are:

have enum ranges cast to integer ordinal ranges (e.g., 0..2 for my example, 0..3 for your two examples above) and vice-versa
preserve the current behavior under the rationale that once you do the cast, you're re-interpreting the values as integers anyway, so it's on you if you're surprised
don't support the cast at all, to avoid confusion

lydia-duncan commented 1 year ago

I think you could get around that by recognizing when the indices would be in descending order (again, only if there's a reasonable pattern that can be followed) and adjusting those to swap their bounds but use a negative stride to get them to print in the appropriate order. So for your color example, the integer range equivalent would be 1..5 by -2 to preserve both the ordering and the values for the enum constants.

But I'd also be okay with not allowing ranges of enums with descending values to be cast to integer ranges.

It seems like if an enum constant has been given an explicit value, that value should be preserved when possible.

vasslitvinov commented 1 year ago

I suggest that conversions between ranges of ints and ranges of enums do not have to be done using a cast. So I prefer the option "don't support the cast at all, to avoid confusion".

bradcray commented 1 year ago

I think you could get around that by recognizing when the indices would be in descending order (again, only if there's a reasonable pattern that can be followed) and adjusting those to swap their bounds but use a negative stride to get them to print in the appropriate order. So for your color example, the integer range equivalent would be 1..5 by -2 to preserve both the ordering and the values for the enum constants.

I'm not convinced this is a problem the compiler/language should be expending effort to try and solve for the user (i.e., "let me see if I can find a numerical pattern in the enums that are described by this range"). It also feels unattractively fragile to me (e.g., primes.one..primes.three being OK but primes.one.. not). Enums can currently contain duplicate values, which feels like a complicating factor. I'm also not aware of compelling use cases for such casts (vs. iterating over the range and then casting the index variable to an int, say).

lydia-duncan commented 1 year ago

I'm not convinced this is a problem the compiler/language should be expending effort to try and solve for the user

Sure, and it's certainly not an immediate priority. But I think it would be significantly more confusing for a user to have the range conversion go to the ordinal value over the one they explicitly specified in their code. Basically, the things I care about are (in no particular order):

Ranges should maintain their sizes when cast between different types
- If it can't, I think that probably should be an error rather than intentionally generating something of a different size
- My belief here is founded in the principle that a cast should remain as fundamentally true to the original as it can in the new circumstance. If a cast mutates the information provided to it, it shouldn't be a cast. I view the number of values in a range as part of the information provided.
Ranges should maintain their values where it would be reasonable to do so
- Where it is not, I think an error should be generated. The ordinal numbers seem most useful when no other numbers are specified, but in the case where both would be valid I think we should defer to what the user specified. If we fall back to the ordinal values, maybe that's something that should generate a warning message so it's obvious to them, with a flag to turn the warning off?
- My belief here is founded in the principle that what the user specifies is more important than anything we generate for them. If it was specified, it has some meaning and that should be respected.
Iterating over a range cast to a different type should result in the same order of values as iterating over the original range
- If the values for the enum constants are not in order, then I think we clearly can't generate an equivalent integer range and should error.
- My belief here is also founded in the principle that a cast should remain as fundamentally true to the original as it can in the new circumstance. Changing the order the values are encountered in seems like a mutation to me. There's definitely scenarios where it doesn't matter as much, such as when the range is used with a coforall.

If those concerns can't be met, I'd rather get an error than have the compiler do something different. At the end of the day, the user can always create something that will do what they want no matter what we decide here but if we can make the behavior match what they're most likely to do without too much trouble, that seems like a win.

Do these seem like valid concerns to you?

It also feels unattractively fragile to me (e.g., primes.one..primes.three being OK but primes.one.. not).

Based on 1 above, I would be okay with primes.one.. not being able to be cast to a range of ints.

Enums can currently contain duplicate values, which feels like a complicating factor.

Based on the combination of the above, I would also be okay with ranges made of such enums not being able to be cast to ranges of ints.

I'm also not aware of compelling use cases for such casts (vs. iterating over the range and then casting the index variable to an int, say).

What are the scenarios where you'd want to cast an enum range to an int range and have it not follow the above concerns? If you have a range of enums, why would you want to cast it to a different type if its behavior was going to significantly differ from the original? I'd be more likely to lean towards making a new range in that case than casting an old one, I don't know that it would even occur to me to cast the old one if I wanted any of those concerns to not be met.

lydia-duncan commented 1 year ago

Here's a case where you would want the result of the range cast to behave exactly like the original:

You're trying to communicate information to another program about this enum, but the other program doesn't know the enum's details, just that you have one. It does know about ranges of ints, so you cast a range covering all the enum constants to a range of ints and send that first. Then the other program uses that range to ask for information related each of the individual constants.

Obviously, in this scenario you want the range you pass to have the same number of values as your original. Otherwise the other program will think there's still information it should wait for after you've finished sending everything.

If the other program is using this information to learn about the enum, you could pass each constants' value as part of the information, but it'll be more efficient if that value is already part of the range it's using to ask you for information.

You want the other program to be asking you about the constants in the same order you're providing it. Since it can't know the constants' names yet, you have to have an agreed upon ordering. Having the cast range numbers be in the same order as the original range means you can use the original range or the cast version without worrying about if you'll accidentally mix up the order of constants, potentially causing problems when the other program tries to use the enum you just told it about.

I can write these two pairs of programs in code if it would help

bradcray commented 1 year ago

Do these seem like valid concerns to you?

Yes, and they definitely relate to why I opened this issue. To be clear, I'm not strongly advocating for either the status quo nor the ordinal interpretation, and absent any other brilliant ideas would probably make it an error.

vasslitvinov commented 11 months ago

@bradcray please consider removing the 2.0 label, or clarify what stabilization work remains here.

bradcray commented 11 months ago

My stance remains essentially the same: I'd mark casts between range(integral) and range(enum) (in either direction) as unstable for the time being, potentially to be deprecated later.

lydia-duncan commented 11 months ago

It looks like you already marked them as unstable, which is why Vass is asking

bradcray commented 11 months ago

Oh, did I? I thought I just tested that and didn't find it to be the case, but I probably forgot to throw the flag (and obviously forgot that I'd marked them as unstable). OK, I agree the tag can be removed then, thanks for the reminder/catching my mistake.

chapel-lang / chapel