Open bradcray opened 1 year ago
The current situation here is so squirrely and difficult to defend, I'm inclined to mark casts between ranges of enums and integers as being deprecated for now. That would also give us time to wrestle with this more after 2.0 if desired.
Also, note that casts between ranges of bool indices are somewhat related to this issue, though also potentially somewhat simpler given that bools are more interchangeable with integers and always/only have one associated value.
PR #22417 improves upon this behavior in the OP:
(300..500): range(shape)
generates an execution-time bounds-style error rather than giving ustriangle..pentagon
back again
by having it gives triangle..pentagon
back again. This at least makes the behavior of casts more consistent, though I think the question of whether it's reasonable for a 3-element range to become a 201-element range is still a very important one.
One way to resolve this is to disable :
casts between ints and enums. Instead, introduce methods or functions for the conversions ordinal ↔ enum constant and numerical value ↔ enum constant. This way it will be always clear what conversion is being made.
As a corollary, we will disable :
casts involving ranges over enums and introduce a parallel set of methods/functions that convert ranges.
One way to resolve this is to disable : casts between ints and enums.
That doesn't seem attractive to me. Nobody has expressed any concern or reservations about that feature as it exists (and conversely, users have argued for the current behavior), and the concerns I have here are specific to ranges and the fact that there are two endpoints / a given size.
I do find it concerning that casting a range of enums to a range of ints changes the number of values covered by the range. It seems like if there's some pattern to the values covered by the enum constants, we could make the range strided using that pattern and that it would be reasonable to error otherwise. E.g.
enum regular { two = 2, four = 4, six = 6, eight = 8};
var r1 = regular.two..regular.eight;
var r2 = r1: range(int); // This would be okay and result in `2..8 by 2`
enum primes { one = 1, two = 2, three = 3, five = 5};
var r3 = primes.one..primes.five;
var r4 = r3: range(int); // This would fail because the numbers covered is not regular
A check along those lines may be burdensome, but since I would expect enums to not have a large number of values, I suspect it probably won't be too burdensome to validate?
It seems like if there's some pattern to the values covered by the enum constants, we could make the range strided using that pattern and that it would be reasonable to error otherwise.
I don't agree with this because, generally speaking, a range of enums is only governed by their ordinal values, not their numerical values. For example, if the enum was:
enum color { red=5, green=3, blue=1};
then the range red..blue
would represent red, green, blue
rather than a degenerate/empty range (as 5..1
would). Essentially, the enum's numerical values don't have any bearing on how the range is interpreted, and are only there if someone wants to do explicit conversions from the enums to integers. So, for me, the only reasonable options are:
0..2
for my example, 0..3
for your two examples above) and vice-versaI think you could get around that by recognizing when the indices would be in descending order (again, only if there's a reasonable pattern that can be followed) and adjusting those to swap their bounds but use a negative stride to get them to print in the appropriate order. So for your color example, the integer range equivalent would be 1..5 by -2
to preserve both the ordering and the values for the enum constants.
But I'd also be okay with not allowing ranges of enums with descending values to be cast to integer ranges.
It seems like if an enum constant has been given an explicit value, that value should be preserved when possible.
I suggest that conversions between ranges of ints and ranges of enums do not have to be done using a cast. So I prefer the option "don't support the cast at all, to avoid confusion".
I think you could get around that by recognizing when the indices would be in descending order (again, only if there's a reasonable pattern that can be followed) and adjusting those to swap their bounds but use a negative stride to get them to print in the appropriate order. So for your color example, the integer range equivalent would be 1..5 by -2 to preserve both the ordering and the values for the enum constants.
I'm not convinced this is a problem the compiler/language should be expending effort to try and solve for the user (i.e., "let me see if I can find a numerical pattern in the enums that are described by this range"). It also feels unattractively fragile to me (e.g., primes.one..primes.three
being OK but primes.one..
not). Enums can currently contain duplicate values, which feels like a complicating factor. I'm also not aware of compelling use cases for such casts (vs. iterating over the range and then casting the index variable to an int, say).
I'm not convinced this is a problem the compiler/language should be expending effort to try and solve for the user
Sure, and it's certainly not an immediate priority. But I think it would be significantly more confusing for a user to have the range conversion go to the ordinal value over the one they explicitly specified in their code. Basically, the things I care about are (in no particular order):
Ranges should maintain their sizes when cast between different types
Ranges should maintain their values where it would be reasonable to do so
Iterating over a range cast to a different type should result in the same order of values as iterating over the original range
If those concerns can't be met, I'd rather get an error than have the compiler do something different. At the end of the day, the user can always create something that will do what they want no matter what we decide here but if we can make the behavior match what they're most likely to do without too much trouble, that seems like a win.
Do these seem like valid concerns to you?
It also feels unattractively fragile to me (e.g.,
primes.one..primes.three
being OK butprimes.one..
not).
Based on 1 above, I would be okay with primes.one..
not being able to be cast to a range of ints.
Enums can currently contain duplicate values, which feels like a complicating factor.
Based on the combination of the above, I would also be okay with ranges made of such enums not being able to be cast to ranges of ints.
I'm also not aware of compelling use cases for such casts (vs. iterating over the range and then casting the index variable to an int, say).
What are the scenarios where you'd want to cast an enum range to an int range and have it not follow the above concerns? If you have a range of enums, why would you want to cast it to a different type if its behavior was going to significantly differ from the original? I'd be more likely to lean towards making a new range in that case than casting an old one, I don't know that it would even occur to me to cast the old one if I wanted any of those concerns to not be met.
Here's a case where you would want the result of the range cast to behave exactly like the original:
You're trying to communicate information to another program about this enum, but the other program doesn't know the enum's details, just that you have one. It does know about ranges of ints, so you cast a range covering all the enum constants to a range of ints and send that first. Then the other program uses that range to ask for information related each of the individual constants.
Obviously, in this scenario you want the range you pass to have the same number of values as your original. Otherwise the other program will think there's still information it should wait for after you've finished sending everything.
If the other program is using this information to learn about the enum, you could pass each constants' value as part of the information, but it'll be more efficient if that value is already part of the range it's using to ask you for information.
You want the other program to be asking you about the constants in the same order you're providing it. Since it can't know the constants' names yet, you have to have an agreed upon ordering. Having the cast range numbers be in the same order as the original range means you can use the original range or the cast version without worrying about if you'll accidentally mix up the order of constants, potentially causing problems when the other program tries to use the enum you just told it about.
I can write these two pairs of programs in code if it would help
Do these seem like valid concerns to you?
Yes, and they definitely relate to why I opened this issue. To be clear, I'm not strongly advocating for either the status quo nor the ordinal interpretation, and absent any other brilliant ideas would probably make it an error.
@bradcray please consider removing the 2.0 label, or clarify what stabilization work remains here.
My stance remains essentially the same: I'd mark casts between range(integral) and range(enum) (in either direction) as unstable for the time being, potentially to be deprecated later.
It looks like you already marked them as unstable, which is why Vass is asking
Oh, did I? I thought I just tested that and didn't find it to be the case, but I probably forgot to throw the flag (and obviously forgot that I'd marked them as unstable). OK, I agree the tag can be removed then, thanks for the reminder/catching my mistake.
This issue asks how casts between enum and int ranges, or two distinct enum ranges, should work, or if they should be supported at all.
Background
enums can be thought of as having two (potential) associated integer values:
For example, given the enums:
green
only has its ordinal value of 1size
has both the ordinal value of 1 and the numerical value of 1square
has both the ordinal value of 1 and the numerical value of 400ranges of enums are defined in terms of their ordinal values
For example, each of the ranges
red..blue
,small..large
, andtriangle..pentagon
describe three indicescasts between ranges of ints are essentially computed by casting the bounds of the range to the new integer type.
For example,
(1..10): range(uint)
is the same as1:uint..10:uint
Current Behavior
Currently, casting between int and enum ranges is sometimes done in terms of the numerical values and sometimes the ordinal values, depending on which way we're going. Specifically, when casting from enum ranges to int ranges, we get behavior similar to the int range casts:
(red..blue): range(int)
is similar tored:int..blue:int
, resulting in a compile-time error because colors can't be cast to integers(small..large): range(int)
is similar tosmall:int..large:int
, resulting in0..2
(triangle..pentagon): range(int)
is similar totriangle:int..pentagon:int
, resulting in300..500
Meanwhile, when casting from int ranges to enum ranges, we seem to take the ordinal interpretation:
(300..500): range(shape)
generates an execution-time bounds-style error rather than giving ustriangle..pentagon
back again(0..2): range(shape)
gives ustriangle..pentagon
(0..2): range(size)
gives ussmall..large
(0..2): range(color)
gives usred..blue
Observations
triangle..pentagon
and300..500
ranges not casting back and forth between one another as you'd expecttriangle..pentagon
to turn into the 201-index range300..500
?" Maybe having it turn into the 3-index ordinal range0..2
would make more sense?Questions