dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

Ranges in switch cases #2105

Closed Hixie closed 1 year ago

Hixie commented 2 years ago

One thing that the goals-and-constraints.md doc doesn't mention (or I missed it) in the "kinds of patterns" section is the ability to match against a range.

I run into this a lot when writing tokenizers.

switch (ch) {
  case 0x30: 
  case 0x31: 
  case 0x32: 
  case 0x33: 
  case 0x34: 
  case 0x35: 
  case 0x36: 
  case 0x37: 
  case 0x38: 
  case 0x39: 
    // ... handle digit ...
    break;
}

...would be much cleaner if I could say:

switch (ch) {
  case 0x30 .. 0x39:
    // ... handle digit ...
    break;
}

...like in, say, Ada.

Hixie commented 2 years ago

(this becomes an even bigger issue when you deal with a-z A-Z... and an even bigger issue when it's entire ranges of Unicode blocks)

munificent commented 2 years ago

Yeah, this is definitely a thing some languages support. My general experience is that it comes up rarely enough that it likely doesn't warrant dedicated language syntax. You can usually just use a series of if (foo >= A && foo <= Z) statements instead.

It's definitely a thing we could do, though.

mnordine commented 2 years ago

My general experience is that it comes up rarely enough that it likely doesn't warrant dedicated language syntax

Are you limiting that to use in switch statements? I assume if it was added, you can use it anywhere to produce an Iterable, including for statements:

for (var i in 1..5) animateThingy(i);

I'd love to see dedicated language syntax for this. I understand it was brought up in the context of patterns, though.

leafpetersen commented 2 years ago

Much as I hate to say this, it feels like the "Darty" way do to this would be via a protocol (possibly Comparable) and define the pattern a..b to match based on the results of that. Or possibly you make this a new protocol (Range?) which takes two arguments and says whether the scrutinee is in range or not. Then this doesn't just work for the builtin types. You could possibly then do the same for for loops: Range could also include an .upTo getter which for (var i in a..b) ... calls (e.g. this de-sugars into for (var i in a.upTo(b))...?

Hixie commented 2 years ago

This comes up all the time, in my experience. But then again maybe I write more tokenizers than the average person.

Hixie commented 2 years ago

(FWIW, I would definitely want this to be something that can compile to a few instructions, not something that involves a method call or anything like that, because performance in tokenizers is critical.)

rrousselGit commented 2 years ago

Rather than something specific to ranges, what about allowing general expressions?

switch (42)
  when it >1 && it < 10:
    print("hey");
    break;
 ...

Then switch case on class instance would be resumed to:

State state;
switch (state) 
  when it is Loading:
    break;
  when it is Data:
    print(it.data); // "it" upcasted to Data
...

And the need for a "default" or not is based on whether after all the cases, it comes up as "Never"

Then adding built in Range support could have the added value of supporting exhaustive checks:

switch (0)
  when it < 0:
    print("negative");
    break;
  when it >= 0;
    print("positive");
    break;
// no default: necessary, Dart knows that all "it" cases are dealt with
Levi-Lesches commented 2 years ago

Rather than something specific to ranges, what about allowing general expressions?

That sounds like it would be fit for just regular if-statements:

switch (42)
  when it >1 && it < 10:
    print("hey");
    break;
 ...
var it = 42;
if (it > 1 && it < 10) print("hey");
State state;
switch (state) 
  when it is Loading:
    break;
  when it is Data:
    print(it.data); // "it" upcasted to Data
...
State state;
if (state is Data) print(state.data);  // promoted to Data
switch (0)
  when it < 0:
    print("negative");
    break;
  when it >= 0;
    print("positive");
    break;
// no default: necessary, Dart knows that all "it" cases are dealt with

var number = 0;
if (number < 0) print("negative");
else if (number >= 0) print("positive");
// "default" would be 'else'
rrousselGit commented 2 years ago

That sounds like it would be fit for just regular if-statements:

The same case could be made about regular switch-case and the various enhanced switch-case proposals related to pattern-matching. All switch-cases can be expressed as it-statements too

munificent commented 2 years ago

maybe I write more tokenizers than the average person.

Pretty sure you and I both write more tokenizers than 99.9% of the population of Earth. :D

For what it's worth, you could accomplish much of this in the current proposal using guards:

switch (ch) {
  case 0x27: // Code for just this character...
  case 0x28: // Code for just this character...
  case 0x29: // Code for just this character...
  case c if (c >= 0x30 && c < 0x39):
    // ... handle digit ...
    break;
  case 0x40: // Code for just this character...
}
Hixie commented 2 years ago

kinda weird that we'd have to introduce a new identifier for this, but i guess it's better than what we can do today. :-)

munificent commented 2 years ago

You could also do:

switch (ch) {
  case 0x27: // Code for just this character...
  case 0x28: // Code for just this character...
  case 0x29: // Code for just this character...
  case _ if (ch >= 0x30 && ch < 0x39):
    // ... handle digit ...
    break;
  case 0x40: // Code for just this character...
}

There's no real need to bind a new variable for the case since the value is already stored in a variable. It just felt a little more idiomatic to me to do so.

leafpetersen commented 2 years ago

You could also do:

switch (ch) {
  case 0x27: // Code for just this character...
  case 0x28: // Code for just this character...
  case 0x29: // Code for just this character...
  case _ if (ch >= 0x30 && ch < 0x39):
    // ... handle digit ...
    break;
  case 0x40: // Code for just this character...
}

This raises an interesting question: should default cases be able to have guards?

rrousselGit commented 2 years ago

This raises an interesting question: should default cases be able to have guards?

That wouldn't quite be a "default" case anymore I think

And we can already do:

default:
  if (guard) {

  }
  break;

What about making the variable declaration optional instead?

switch (ch) {
  case if (ch >= 0x30 && ch < 0x39):
    break;
}
munificent commented 2 years ago

That wouldn't quite be a "default" case anymore I think

And we can already do:

default:
  if (guard) {

  }
  break;

Right. There's relatively little value in have guard clauses on the last case (default or not). The main value proposition of guards is that if the guard fails, execution proceeds to the next case. If there is no next case, the behavior is the same as simply using a nested if statement like @rrousselGit has here.

What about making the variable declaration optional instead?

switch (ch) {
  case if (ch >= 0x30 && ch < 0x39):
    break;
}

We certainly could, but I'm hesitant to prematurely sprinkle more syntactic sugar on the feature. Especially in this case when you could just as easily use case _ if which is only two characters longer.

leafpetersen commented 2 years ago

Right. There's relatively little value in have guard clauses on the last case (default or not).

Sorry, I should have been more clear. Obviously, this would entail allowing multiple default clauses. Or syntax like what @rrousselGit suggests.

munificent commented 2 years ago

Obviously, this would entail allowing multiple default clauses.

Ah, I got you now. Honestly, once we have patterns and wildcards in switch, I think I would push for eliminating use of default in switches entirely. case _ is shorter and lines up more nicely with the other cases.

rrousselGit commented 2 years ago

Sorry, I should have been more clear. Obviously, this would entail allowing multiple default clauses.

I don't like the usage of the default keyword here.

I read default as "Your switch is a non-exhaustive pattern match, so you need to handle unknown values". Having a guard on a default feels like a paradox.

Writing case _ sounds more appropriate

leafpetersen commented 2 years ago

Especially in this case when you could just as easily use case _ if which is only two characters longer.

Maybe. I wonder if there's an entire use case that we're ignoring here, which is expression level switches where the guards are the only thing of interest. I filed an issue to discuss this here to avoid further derailing this one.

Hixie commented 2 years ago

I was rereading https://github.com/dart-lang/language/blob/master/working/0546-patterns/patterns-feature-specification.md today and it really seems to me that we could easily implement this by just having a matcher whose syntax is numericLiteral ".." numericLiteral or numericLiteral "..." numericLiteral or some such, maybe limited specifically to integer literals since that's the common use case though it seems like it could work with doubles too.

I guess ideally you'd want it to work with numeric constants too, at which point it does become a bit ambiguous because of the existing ".." operator, but...

lrhn commented 2 years ago

If we look to C# for inspiration, we could have <4 or >=5 as patterns. (Probably something that only works for numbers, not arbitrary user-defined operator< methods, because we want the pattern to be "constant"-y). If we do that, then 4..5 isn't that far a leap either.

It's always a little weird to describe the difference between 2..5 and 2...5 (I'd have guessed the former as exclusive and the latter inclusive, but a quick check of Perl syntax suggests it's the other way around).

If we do have 2..5 as a pattern, can I please do for (var i in 1..5) too? :grin:.

mnordine commented 2 years ago

It's always a little weird to describe the difference between 2..5 and 2...5

Most would already know, but Rust decided to do .. and ..= instead.

1..2;   // std::ops::Range
3..;    // std::ops::RangeFrom
..4;    // std::ops::RangeTo
..;     // std::ops::RangeFull
5..=6;  // std::ops::RangeInclusive
..=7;   // std::ops::RangeToInclusive

From: https://doc.rust-lang.org/reference/expressions/range-expr.html

munificent commented 2 years ago

Range syntax is hard. I encountered this when adding ranges to my old hobby language.

Note that Swift and Rust are exactly opposite from Ruby in how they interpret .... :(

We could maybe use .. for ranges in Dart, if we limit it to range patterns but that feels like it would be really confusing since it means something totally unrelated everywhere else. And it would completely paint us into a corner if we ever wanted to support more complex constant expressions in patterns.

If we do some kind of range pattern syntax, like Lasse suggests, I expect we'd like to support the same syntax in for-in loops. At that point, we're really considering something like a range operator and range objects (which is how I handled them in Wren). Given how full Dart's expression grammar already is, that may be hard to achieve.

From a simplicity and consistency standpoint, I really like C#'s approach of using <4, >=5, etc. I think it's very natural to treat patterns as infix expressions whose left operand is implicitly the matched value, and that is easily extensible to other operators or operations if they turn out to be useful. It also lets you express half-open ranges, which some other approaches don't handle. It is kind of verbose for the common case of an integer range, though.

I think my current inclination is to punt on this for now. The patterns proposal is already quite large and introduces a lot of new syntax. We don't have to support every possible kind of pattern on the first release. As we have with expressions and statements, we can ship additional kinds of patterns over time in future releases of Dart. (The > patterns were added to C# after patterns shipped, I believe.)

mnordine commented 2 years ago

I do hope that range syntax does make it in eventually, because I find it makes the language more expressive. Full agree with your last paragraph 👍

munificent commented 2 years ago

The latest version of the patterns proposal now supports & and comparison operator patterns. That lets you do ranges in switch cases like:

switch (ch) {
  case >= 0x30 & <= 0x39:
    // ... handle digit ...
    break;
}

Maybe not as nice as built-in range syntax, but it works (and generalizes in ways that dedicated range syntax might not). We're not certain that these patterns are going to make the cut in the first release, but I hope they do.

If not, guards on cases can be used to run arbitrary predicates like:

switch (ch) {
  case x when x >= 0x30 && x <= 0x39:
    // ... handle digit ...
    break;
}
munificent commented 1 year ago

I'm going to close this because between guards and the comparison operator patterns in the proposal, I think we have this fairly well covered. I like the idea of dedicated range syntax, but Dart's grammar is getting pretty full of punctuation and I think there's an argument that explicit is better.