dotnet / csharplang

The official repo for the design of the C# programming language
11.4k stars 1.02k forks source link

Idea for `switch` expression syntax #663

Closed lachbaer closed 6 years ago

lachbaer commented 7 years ago

I recently had an idea for a less verbose version of a switch ('match') expression. It resembles the classical ternary operator with ? :

var area = case(switch) ?
         : Line l => 0
         : Rectangle r => r.Width * r.Height
         : Circle c => Math.PI * c.Radius * c.Radius
         : throw new ApplicationException();

There are three major differences.

  1. The case operator before the ? makes this first expression a case expression instead of a boolean.
  2. Between ? and the first : is nothing but whitespace
  3. Multiple case-sections are seperated by : without a preceeding comma or so

Because the expressions return a value (or throw) I chose the => token to seperate the pattern from the following expression. But this, like all the rest of course, is up to discussion.

jnm2 commented 7 years ago

I'd prefer just:

var area = switch (obj) ?
         : Line l => 0
         : Rectangle r => r.Width * r.Height
         : Circle c => Math.PI * c.Radius * c.Radius
         : throw new ApplicationException();
lachbaer commented 7 years ago

@jnm2 I used case(...) instead of the switch keyword, because it better optically differentiates from the latter and cannot be mistaken as a switch-statement on a first glimpse. Also it should serve as a factored out case in substitute for all the left out ones. And it has 2 letters less ;-)

jnm2 commented 7 years ago

@lachbaer I think case should indicate a case and switch should indicate a switch. ;-) I also like the optics better. I don't want switch to only have a statement form.

CyrusNajmabadi commented 7 years ago

👍 on ? and : being used. I actually like that.

I would lean toward reusing 'switch' . We'd then have switch-statements "with { }" and switch-expressions with "? :". It feels copacetic.

lachbaer commented 7 years ago

@jnm2 switch already is a statement keyword, case is more contextual and thus independent IMO.

Another question is, how the default case should be handled. 1. Either as the last expression after the last : colon, without => 2. or with the verbose default keyword 3. or with the _ discard pattern?

var area = switch(obj) ?
         : Line l => 0
         : Rectangle r => r.Width * r.Height
         : Circle c => Math.PI * c.Radius * c.Radius
         : throw new ApplicationException();
// or
         : Circle c => Math.PI * c.Radius * c.Radius
         : default => throw new ApplicationException();
// or
         : Circle c => Math.PI * c.Radius * c.Radius
         : _ => throw new ApplicationException();

Update: changed case op to switch op keyword; show all 3 variants.

CyrusNajmabadi commented 7 years ago

I think both "default" or "_" would be fine. "default" would make a lot of sense if we went with "switch(...)" as the form.

jnm2 commented 7 years ago

@CyrusNajmabadi I kind of like not being forced to write default => or _ => though, as in the original example. What do you think of that?

lachbaer commented 7 years ago

But as : default => will be most probably be the last item in the case-list, ommiting it completely will make the whole expression much slender (see initial example). It can be seen as a final else, like the one (and only) in the ternary op.

Or would that be too hard to implement on the compiler or arbitrary in any other way?

CyrusNajmabadi commented 7 years ago

@jnm2 I'm a bit torn. it seems easy to accidently forget the case and have it compile. But maybe it's not a biggie. It's probably that it's just my first time seeing it.

Given that thsi is an expression context, i think brevity is a good idea. So i'd support not needing the default or _

lachbaer commented 7 years ago

In my initial post I wrote "Between ? and the first : is nothing but whitespace". When making this optional empty case blocks can be allowed and ignored. So = switch(obj)? true => 1 :::: false => 0 :: null would be syntactically ok. And it also allows a formatting like in the examples above.

CyrusNajmabadi commented 7 years ago

so, i'm thinking the ? isn't even necessary. just use : or, if we want to be ocaml'ish |

jnm2 commented 7 years ago

I like ? ... : for its similarity to the transformation of the statement form if ... else to the expression form.

lachbaer commented 7 years ago

Or is even switch ommitable when the first tokens are ? :? = obj ?: true => 1 : false => 0 :: null

lachbaer commented 7 years ago

See the example directly above. The default case is last and initiated with a double colon to protect from accidents.

yaakov-h commented 7 years ago

I assume you wouldn't be permitted to nest these?

  1. Between ? and the first : is nothing but whitespace

I'd suggest to also allow other trivia - #if, #pragma, code comments etc.

jnm2 commented 7 years ago

@yaakov-h Good point, you should be forced to nest with parentheses.

HaloFour commented 7 years ago

I believe that such a syntactic construct would be pretty brutal to read as a one-liner, especially if used in/around other conditional operators. Sure, one can format it over multiple lines, but in my experience that's relatively rare with conditional operators, even nested ones, and I don't expect it will be the normal case here either.

yaakov-h commented 7 years ago

Hmm, good point. I wonder what developers would do by default if all the documentation, examples etc. kept it multi-line.

Definitely could be solved with an analyzer, but it's a very common case... maybe a default VS formatting instruction and Info analyzer, similar to how you can control this. prefixing etc.?

orthoxerox commented 7 years ago

I think the default clause might be confused with the target typed default here. If the syntax is so expression-like, I'd rather have _ or var _ as the match-all clause and use default as a target-typed default.

I like the syntax overall, but I think we should be able to nest match expressions without adding parentheses, since you might accidentally produce a legal, but semantically invalid nested expression without parens.

lachbaer commented 7 years ago

I believe that such a syntactic construct would be pretty brutal to read as a one-liner, especially if used in/around other conditional operators

I have seen hard to read 'ordinary' expressions already. This one wouldn't make much of a difference. It should be short in case that it completely fits in one line (without being 1012 characters long ;-) and also have a pleasent expression like appearance when spread over multiple lines.

I think we should be able to nest match expressions without adding parentheses

This problem already exists with every kind of operator precedence, where the expression is syntactically correct but produces semantically unwanted results. I think this is something every programmer must be (and is) aware of from the very beginning. But to address this concern, something like an 'end token' must be introduced to the construct, one that cannot introduce further ambiguities.

orthoxerox commented 7 years ago

Looking at other languages, ML dialects have the tersest expression forms, but they use significant whitespace to control nesting.

Scala's nice, if a little verbose with all the cases:

x match {
  case 0 => "zero"
  case 1 => "one"
  case 2 => "two"
  case _ => "many"
}

Nemerle is also rather similar:

match (color) {
  | RgbColor.Red => "red"
  | RgbColor.Yellow => "yellow"
  | RgbColor.Green => "green"
  | RgbColor.Different (r, g, b) => $"rgb($r, $g, $b)"
}

Kotlin has when (NB: it has newline-terminated syntax):

when (x) {
    1 -> "x == 1"
    2 -> "x == 2"
    else -> { // Note the block
        "x is neither 1 nor 2"
    }
}

Swift has no match expression, Haxe is similar to Kotlin in that is has a unified switch.

Rust is very similar to Scala:

match x {
    1 => "one",
    2 => "two",
    3 => "three",
    4 => "four",
    5 => "five",
    _ => "something else",
}

NB: All these languages (except Swift) have no statements, everything is an expression, so this makes curly braces look natural in expressions. However, C# has curlies in initializers, so I don't think it's a huge problem if match will use curlies.

lachbaer commented 7 years ago

Nevertheless, C# is C derived and finding a C-based syntax would just be nice.

Also it might be important to discuss the role of the , comma for C#. Will there be (in a future far far away ;-) something like C's comma operator (useful in loops)? Or can it be safely used to seperate sections, as in the last example above?

alrz commented 7 years ago

Seriously, if that's what you want, it's already possible:

var area = e is Line l ? 0 
         : e is Rectangle r ? r.Width * r.Height
         : e is Circle c ? Math.PI * c.Radius * c.Radius
         : throw new ApplicationException();

switch/match expression doesn't suffer from readability/formatting hell,

var area = e match {
  case Rectangle r: r.Width * r.Height
  case Circle c: Math.PI * c.Radius * c.Radius
  default: throw new ApplicationException()
};
lachbaer commented 7 years ago

@alrz That's interesting! Now the 'e is' must only be factored out.

var area = switch(e) 
         : Line 1 ? 0
         : Rectangle r ? r.Width * r.Height
         : Circle c ? Math.PI * c.Radius * c.Radius
         : throw new ApplicationException();

The first Token after switch(...) is a colon. Yet, I don't know if that is better to read 😕

orthoxerox commented 7 years ago

I don't really feel the need to make match super-terse. Braces around the clauses improve the readability and I can live with most of the proposed pattern-expression separators and clause initiators or terminators.

alrz commented 7 years ago

Also think of the possibility of promoting "match" as a statement, like switch without break.

gafter commented 7 years ago

This proposal is syntactically ambiguous.

var x = case (e1) ?
    : (int x, int y) => e5 // is this a lambda expression as the default value, or a tuple-pattern-case?
    ;

It gets worse with nested case expressions because of the dangling else issue.

quinmars commented 7 years ago

I think the colon is at the wrong place in the proposal. Finally the match or switch expression performs a map operation. So why not use a dictionary-like syntax?

var result = match (txt)
{
    "one": 1,
    "two": 2,
    "three": 3,
    "four": 4,
    default: 0
};

Or the initial example:

var area = match (obj)
{
    Line l : 0,
    Rectangle r : r.Width * r.Height,
    Circle c : Math.PI * c.Radius * c.Radius,
    default: throw new ApplicationException()
};
qrli commented 7 years ago

In c#, we usually have long descriptive names, so the match expression will usually be long and multiple lines. So I don't think a very terse syntax fits better, except for really simple expressions.

In addition, either : or | instead of case reads OK when it is the leading character of line, but not when you write them in a single line, e.g.:

var area = obj match : Line l => 0 : Rectangle r => r.Width * r.Height : throw new Exception();

So, I still prefer the original match expression proposal. But I agree with @orthoxerox that {...} reads better than (...) in that proposal.

lachbaer commented 7 years ago

@gafter

This proposal is syntactically ambiguous.

You have the better overview. Will there be tuple ambiguities when the default case must be written like _ => ...?

@quinmars

So why not use a dictionary-like syntax?

In PHP, also a C derived language, dictionaries are written like this 😉

$array = array(
   "foo" => "bar",
   "bar" => "foo",
);

I believe that a terse expression form will be often used where there are only 2 cases plus maybe a default case and the following expression is short as well. Otherwise every programmer will likely spread it over multiple lines. Not to have multiple ways of doing the same thing is a requirement, so the resulting syntax should support both, terseness and a good writing-reading-experience on multi-lines.

The idea to go with ? and : arose from the similarity between switch and if ... else if ... as it sold by nearly every book I read. Whereas c ? a : b is quite easy to learn and understand I agree that a o ? c1 => a : c2 => b : c3 => d is already much harder in both regards.

As I have written above already, the (future) role of the comma token might be important for this discussion. It seems irrelevant, but finally defining its meaning as a section seperator and thus completely dump its potential operator meaning like it is in C or other for purposes can have impact on future syntax extensions or constructs.

Comparing some (of many) variants, with colon :, comma , and ?

shape ?: Rectangle r => r.Width * r.Height : Circle c => Math.PI * c.Radius * c.Radius : _ => 0;
switch(shape) ? Rectangle r => r.Width * r.Height : Circle c => Math.PI * c.Radius * c.Radius : _ => 0;

shape ?: Rectangle r => r.Width * r.Height, Circle c => Math.PI * c.Radius * c.Radius, _ => 0;
switch(shape) ? Rectangle r => r.Width * r.Height, Circle c => Math.PI * c.Radius * c.Radius, _ => 0;

shape ?: Rectangle r : r.Width * r.Height, Circle c : Math.PI * c.Radius * c.Radius, _ : 0;
switch(shape) ? Rectangle r : r.Width * r.Height, Circle c : Math.PI * c.Radius * c.Radius, _ : 0;

shape ?: Rectangle r ? r.Width * r.Height : Circle c ? Math.PI * c.Radius * c.Radius : 0;
switch(shape) ?: Rectangle r ? r.Width * r.Height : Circle c ? Math.PI * c.Radius * c.Radius : 0;

My favorite's the very first one, because it is terse, verbose enough in a multi-liner and resembles the already-known-to-me ternary operator. In a one-liner I will instictively go looking for the colon as the seperator, and not a comma.

Which one would you go with?

quinmars commented 7 years ago

Whereas c ? a : b is quite easy to learn

I find the ternary operator hard to read and try to avoid it where possible.

As I have written above already, the (future) role of the comma token might be important for this discussion. It seems irrelevant, but finally defining its meaning as a section seperator and thus completely dump its potential operator meaning like it is in C or other for purposes can have impact on future syntax extensions or constructs.

The comma operator in C is probably the least kown operator at all. I'm pretty sure there is a large portion of C developers who never used it (besides in for loops). Probably many of them even know it. Since we have now tuples in C# there is only a very small corridor where the comma operator would be applicable:

int i = (1, 2);    // Valid in C; in C# you cannot convert a tuple
                   // implicitly to an integer - for good reasons

int i = 1, 2;      // Could also work in C#, but I'm not sure
                   // if i will be 1 or 2, probably 1
                   // because of  operator precedence

The comma operator can be useful in C macros, but I pray that it will never be part of C#.

orthoxerox commented 7 years ago

@lachbaer All four one-liners look completely indecipherable to me.

lachbaer commented 7 years ago

Concerning the braces, I think there are already too many occurances of braces in the language, thankfully mitigated by the introduction of expression bodies. I don't deem it right to further introduce new braces especially to expressions. Also I think that }. doesn't really look nice if you want to chain the result of the switch expression.

switch(shape) {
  Rectangle r => r.Width * r.Height,
  Circle c => Math.PI * c.Radius * c.Radius,
  _ => 0
}.ToArea().Transform(...);

An idea for a combination of current ideas is

switch( shape ?
  Rectangle r => r.Width * r.Height,
  Circle c => Math.PI * c.Radius * c.Radius
  : throw new IllegalArgumentException()
).ToArea().Transform(...);

The cases are seperated by comma and the (optional) default case is after the colon. And the construct is also one-liner suitable, because of terseness.

HaloFour commented 7 years ago

The current proposal is the following:

var area = shape switch (
    case Line l: 0,
    case Rectangle r: r.Width * r.Height,
    case Circle c: Math.PI * c.Radius * c.Radius,
    case _: throw new IllegalArgumentException()
);

Or, in one-liner form:

var area = shape switch (case Line l: 0, case Rectangle r: r.Width * r.Height, case Circle c: Math.PI * c.Radius * c.Radius, case _: throw new IllegalArgumentException());

All the alternatives seem to do is to dispense of the keyword case while shuffling around the punctuation. Personally, I kind of like case as it visually breaks up the expression. I also like that the switch expression keyword is used in a fairly different manner which helps to distinguish it from a switch statement, more like a comparison operator.

alrz commented 7 years ago

@lachbaer Your examples look like some brutal level "spot the difference" puzzle.

QuickC commented 7 years ago

The idea is sound, but the shape of the code must look like an expression, case and switch serve no value either. also note there is no type information, or a good place to add it.

case(switch) ? : Line l => 0 : Rectangle r => r.Width r.Height : Circle c => Math.PI c.Radius * c.Radius : throw new ApplicationException();

Simple example: First, "looks" like and expression 'int ascCode = ..." Second, we have what we want to switch upon 'code = ?' the equal sign is needed for value extraction. Third , I used a bar '|' because of known uses as an 'or' followed the use of code in a function that returns the correct type '|40: toASC(Code) ' with the | and : allowing spaces to be removed if desired. Forth, Using the '_' for a catch all.

char code = '1' int ascCode = code = ? |40: toASC(code) |41: toASC(code) |: -1; int ascCode = code as c = ? |40 : toASC(c), |41 : toASC(c), | : thow InputError;

Example: First, looks like and expression 'double area = ' Second, here we have the type evaluation 'obj is ?' this style only works for one type of pattern per expression, that make sense in an expression pattern match. Third, the type match and decomposing expression '| Rectangle r : ' is clear and concise Forth, again a catch all '_' with a pattern match error.

double area = obj is ? |Rectangle r: r.Width r.Height |Line l: 0.0 |Circle c: Math.PI c.Radius |: throw MatchError; double area = obj is ? | Line l : 0.0 | Rectangle r : r.Width r.Height | Circle c : Math.PI c.Radius | : throw MatchError;

A syntax that works for single line and multi-line is desired. This looks f# and allows for a place for guards. Thoughts?

lachbaer commented 7 years ago

I don't like the | pipe symbol for this. Just because for optical reasons. In a ... ? ... : ... construct my eyes are scanning for the : colon as a seperator, maybe that's because.

Thaina commented 7 years ago

I prefer these

var r = when(obj) ? Circle c => c.Radius : Rectangle r => r.Diagon : throw new Exception();

var r = obj ? Circle c => c.Radius : Rectangle r => r.Diagon : throw new Exception();
/// normally `?` work only on bool if I remember it right, why need any keyword?

/// I think when,match,switch is equally preferable
var r = obj [when|match|switch] {
    case Circle c : c.Radius;
    case Rectangle r : r.Diagon;
    default: throw new Exception();
}

I really like your concept but I think it more confused to promote case to be keyword at that place

And as for me it so counterintuitive to have a parentheses () contain case or case-like expression. It would be more natural for braces {} or bracket []

lachbaer commented 7 years ago

A counter-argument against using ? and :. This and the ternary operation will very well be used within each other. If written on long lines it will be very hard to seperate one from the other.

a = b ? Class c => c.IsValid() ? c.Value : 0 : Class d => d.EnumValue ? EnumA => new D()
  : EnumB => (f =d.GetE()) != null ? f : throw new Ex() ......

Now I got myself confused while typing it 😆 Well, you get the idea. So from that practical pov it may not be a good idea to reuse ? and : for this in this way.

obj switch {...}, i.e. the switch in the middle, bothers me a bit, because an operator or operator keyword that is in the middle should be a binary operator where 'left' and 'right' are combined.

switch(obj) {...} therefore seems a little better to me. When using the braces to group something together, like e.g. done in object or array initializers - i.e. they do not introduce a new scope block - the expression before them should be somehow 'final'. The syntax obj switch 'hangs' visually.

A case syntax like in the ordinary switch statement should be avoided also. First, it makes the expression longer, especially when one line would be sufficient. Second, ; endings should not be used within an expression, not even in a braced block! They are not used as seperators in other braced (non-statement-block) scenarios. Also it makes the whole bunch of characters look like a statement-block, what it isn't (misleads the reader). Third, case ... : ... ; makes me think that a break or so is missing. It won't be long until quite a number of people fell into that trap and add a break; to the expressions while typing.

Actually I prefer using => over :, because it makes for a better seperation and emphasizes that a new expression is following. And because of the mixing of : with the ternary op at the beginning of this post. In case a following expression is valid for multiple values, the , can still be used.

Summerized example:

var a = switch(obj) {
            B b => subexpr1,
            C c => subexpr2,
            null, int _, float _ => throw new Ex(),
            _ => throw new OtherEx(),   // notice the last dispensable comma
        };

More abstract: every , collects cases until => occurs and evaluates that expression. Then with the next , the next collection of cases starts. Empty cases (multiple following commas) are allowed, that's why a final comma after the last expression is allowed.

All this is of course open for discussion - or, wait... better not 😛 - just kidding 😄

orthoxerox commented 7 years ago

I am completely not against this version, but I worry it will be annoying to parse: the first four tokens are the same as in a switch statement.

lachbaer commented 7 years ago

@orthoxerox But in an expression context. As is default(T) or nameof(obj).

Addendum: btw, that's why I used case(obj) in the initial post, to verbosely seperate the statement from the expression. That brings us back to the use of a possible match keyword 😉

Thaina commented 7 years ago

@lachbaer If you try to chain multiple ?: operator even as of current valid syntax that would give you confusion for sure. It was the rule of thumb in most language that, even allowed, you should not have chain ? operator (jslint give error for doing that)

But I think switch(obj) { => } or when(obj) { => } is interesting and not so weird

orthoxerox commented 7 years ago

@lachbaer what if it's an expression statement?

lachbaer commented 7 years ago

@orthoxerox What places do you mean where the distinction between the two are ambiguous? I can currently only think of assignments and method calls as expression statements. ES as in C/++ are not allowed in C#.

My mind just told me an argument for the syntax of obj switch { => }. In case 'obj' is the result of another expression that syntax is really clearer to read. Compare:

var a = switch(this.is().a().long(expr => that(expr)).is().evaluated().before()) 
        { _ => return; }

var a = (this.is().a().long(expr => that(expr)).is().evaluated().before()) 
        switch { _ => return; }

When you're done reading the expression, you might already have forgotten that a switch shall be executed on it.

I am friend of consistency, and when a syntax like object op-keyword { ... , ... , ... } is introduced, other (future or planned) constructs shall follow it. op-keyword(object) already is established, as is a non-statement-block-block after an expression (new Object() { in = iti, alyz = er })

orthoxerox commented 7 years ago

@lachbaer You're right, but if you write the switch expression in a statement context, it will either have to be parsed as an expression to give the correct error, or the compiler will complain about the lack of case.

lachbaer commented 7 years ago

I want to roll back a bit. Given that block tokens should be used ({ }) to avoid the dangling else problem, as well as that the expression must have a default (or discard) case in order to always return a value, maybe the ternary operator actually could be reused for this:

var a = obj ? { A a => a.value1, B b => b.value2 } : throw new Ex();

var area = shape1 ? {
               Line _ => 0,
               Rectangle r => r.Width * r.Height,
               Circle c => Math.PI * c.Radius * c.Radius,
           } : throw new ApplicationException();

The only difference now is that the 'true' part of the classical ternary op now has a case-evaluation-block (the parsers sees the { after ?). And the 'false' part is the default expression, what's kind of logical, because if neither case is true, false is what remains.

Summarize:

QuickC commented 7 years ago

I like your last format for c# for 'shape1'

int I = EnumType ? 
{
  ONE => 1,
  TWO => 2,
  // no longer used  FOUR => 4 
  _ => 0,
}

Is this correct for and enum '?'

lachbaer commented 7 years ago

@QuickC The default case (_ => 0) is not necessary, because it is represented by the 'else' part:

int I = EnumType ? { 
    ONE => 1, 
    TWO => 2, 
    // no longer used  FOUR => 4
} : 0;

However you are free to use the discard pattern for optical reasons. Nevertheless the 'else' part must be present, even if it only acts as a dummy result.

QuickC commented 7 years ago

@lachbear That's a better answer, I was looking at your previous post and think that the } : part returned an alternate return type similar to "Rust".

It comes from my embedded history, we think of exceptions as types, if at all.

lachbaer commented 7 years ago

REVOKED POST

[SELFQUOTE] Nevertheless the 'else' part must be present, even if it only acts as a dummy result.

I thought to myself that that's ugly. So I throw in another suggestion, that still resembles the ternary operator, but makes the ': default' part optional, in case flow analysis ensures that it will never be evaluated, e.g. with a present _ => catch-all pattern.

int I = EnumType ? { 
    ONE => 1, 
    TWO => 2, 
    // no longer used  FOUR => 4
    : 0 };

The whole part after ?, including the default(else)-case, is wrapped in braces. That makes nesting of ternary operators (if- and switch-style) better readable. However being somehow logical, I don't like this form, now that I wrote it down. So, I revoke it.