dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

Proposal: Block-bodied switch expression arms #3037

Open 333fred opened 4 years ago

333fred commented 4 years ago

Block-bodied switch expression arms

Summary

This proposal is an enhancement to the new switch expressions added in C# 8.0: allowing multiple statements in a switch expression arm. We permit braces after the arrow, and use break value; to return a value from the switch expression arm.

Motivation

This addresses a common complaint we've heard since the release of switch expressions: users would like to execute multiple things in a switch-expression arm before returning a value. We knew that this would be a top request after initial release, and this is a proposal to address that. This is not a fully-featured proposal to replace sequence expressions. Rather, it is constrained to just address the complaints around switch expressions specifically. It could serve as a prototype for adding sequence expressions to the language at a later date in a similar manner, but isn't intended to support or replace them.

Detailed design

We allow users to put brackets after the arrow in a switch expression, instead of a single statement. These brackets contain a standard statement list, and the user must use a break statement to "return" a value from the block. The end of the block must not be reachable, as in a non-void returning method body. In other words, control is not permitted to flow off the end of this block. Any switch arm can choose to either have a block body, or a single expression body as currently. As an example:

void M(List<object> myObjects)
{
    var stringified = myObjects switch {
        List<string> strings => string.Join(strings, ","),
        List<MyType> others => {
            string result = string.Empty;
            foreach (var other in others)
            {
                if (other.IsFaulted) return;
                else if (other.IsLastItem) break; // This breaks the foreach, not the switch

                result += other.ToString();
            }

            break result;
        },
        _ => {
            var message = $"Unexpected type {myObjects.GetType()}";
            Logger.Error(message);
            throw new InvalidOperationException(message);
        }
    };

    Console.WriteLine(stringified);
}

We make the following changes to the grammar:

switch_expression_arm
    : pattern case_guard? '=>' expression
    | pattern case_guard? '=>' block
    ;

break_statement
    : 'break' expression? ';'
    ;

It is an error for the endpoint of a switch expression arm's block to be reachable. break with an expression is only allowed when the nearest enclosing switch, while, do, for, or foreach statement is a block-bodied switch expression arm. Additionally, when the nearest enclosing switch, while, do, for, or foreach statement is a block-bodied switch expression arm, an expressionless break is a compile-time error. When a pattern and case guard evaluate to true, the block is executed with control entering at the first statement of the block. The type of the switch expression is determined with the same algorithm as it does today, except that, for every block, all expressions used in a break expression; statement are used in determining the best common type of the switch. As an example:

bool b = ...;
var o = ...;
_ = o switch {
    1 => (byte)1,
    2 => {
        if (b) break (short)2;
        else break 3;
    }
    _ => 4L;
};

The arms contribute byte, short, int, and long as possible types, and the best common type algorithm will choose long as the resulting type of the switch expression.

Drawbacks

As with any proposals, we will be complicating the language further by doing these proposals. With this proposal, we will effectively lock ourselves into a design for sequence expressions (should we ever decide to do them), or be left with an ugly wart on the language where we have two different syntax for similar end results.

Alternatives

An alternative is the more general-purpose sequence expressions proposal, https://github.com/dotnet/csharplang/issues/377. This (as currently proposed) would enable a more restrictive, but also more widely usable, feature that could be applied to solve the problems this proposal is addressing. Even if we don't do general purpose sequence expressions at the same time as this proposal, doing this form of block-bodied switch expressions would essentially serve as a prototype for how we'd do sequence expressions in the future (if we decide to do them at all), so we likely need to design ahead and ensure that we'd either be ok with this syntax in a general-purpose scenario, or that we're ok with rejecting general purpose sequence expressions as a whole.

Unresolved questions

Design Meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-08-28.md#block-bodied-switch-expression-arms

CyrusNajmabadi commented 4 years ago

It is an error for the endpoint of a switch expression arm's block to be reachable. break with an expression is only allowed when the nearest enclosing switch, while, do, for, or foreach statement is a block-bodied switch expression arm.

This limitation seems ok, but still odd. We don't do the same elsewhere. For example, i could have a continue; inside a switch inside a foreach.

Seems like we could just allow the break/continue to bind to the nearest applicable construct.

333fred commented 4 years ago

For example, i could have a continue; inside a switch inside a foreach.

These are all statements. This is inside an expression, which has previously only been break-out-able by throwing.

CyrusNajmabadi commented 4 years ago

These are all statements. This is inside an expression, which has previously only been break-out-able by throwing.

Sure... i get that it's new. my only point was: we're allowing statements inside the switch now. And it doesn't seem strange to support the concept of these statements in the switch jumping to other statements.

HaloFour commented 4 years ago

Again, relevant: ~https://openjdk.java.net/jeps/325~ ~https://openjdk.java.net/jeps/354~ https://openjdk.java.net/jeps/361

// as statement
switch (p) {
    case 1 ,2, 3 -> System.out.println("Foo");
    case 4, 5, 6 -> {
        System.out.println("Bar");
    }
};

// as expression
String result = switch (p) {
    case 1 ,2, 3 -> "Foo";
    case 4, 5, 6 -> {
        yield "Bar";
    }
};

Java does not allow control statements within the arms of a switch expression:

LABEL1: while (true) {
    String result = switch (p) {
        case 1 -> "Foo";
        case 2 -> {
            break LABEL1; // error: Break outside of enclosing switch expression
        }
        case 3 -> {
            continue LABEL1; // error: Continue outside of enclosing switch expression
        }
        case 4 -> {
            return; // error: Return outside of enclosing switch expression
        }
        case 5 -> throw new IllegalStateException(); // fine
    };
}

But it's perfectly fine with switch statements:

LABEL1: while (true) {
    switch (p) {
        case 1 -> System.out.println("Foo");
        case 2 -> {
            System.out.println("Bar");
            break LABEL1;
        }
        case 3 -> {
            continue LABEL1;
        }
        case 4 -> {
            return;
        }
        case 5 -> throw new IllegalStateException();
    };
}
orthoxerox commented 4 years ago

@333fred your example is missing a semicolon after the switch expression.

@HaloFour There's #1597 for labeled loops and and even older https://github.com/dotnet/roslyn/issues/5883 with a WONTFIX resolution.

HaloFour commented 4 years ago

@orthoxerox

Nod, just demonstrating examples of switch expressions/statements in Java as they are developing very similar features, including using break as a way to return a value from an expression switch arm. I probably didn't need to use labeled loops, I was just throwing a bunch of spaghetti at IntelliJ to see what would compile and what wouldn't and happened to copy&paste that sample here.

IMO it might be worth considering the design choices already made by the Java team as they intend to use switch statements/expressions at the center of their pattern matching proposals just as C# has and they have been making tweaks to the preview syntax over the past two compiler releases.

svick commented 4 years ago

@333fred Consider the following code

foreach (var item in items)
{
    _ = item switch {
        1 => {
            continue; // allowed, continues the foreach
            break; // not allowed
            break 1; // allowed, "returns" from the switch
        }
    };
}

It feels inconsistent that break; is not allowed in this situation, when continue; is. And since there is no ambiguity (break; is never associated with the switch expression, while break expr; always is), I think it makes sense to allow this code.

HaloFour commented 4 years ago

Oops, I messed up. Java 13 switched to yield instead of break to return a value from a switch. I bet that was because of the confusion between breaking out of the switch arm vs. returning a value.

HaloFour commented 4 years ago

@svick

Java 13, for reference/comparison:

for (int item : items) {
    int result = switch (item) {
        case 1 -> {
            continue; // compiler error
            break; // compiler error
            yield 1;
        }
        default -> 0;
    };

    switch (item) {
        case 1 -> {
            continue; // just fine
            break; // just fine
            System.out.println(1);
        }
        default -> System.out.println(0);
    }
}

Shame the team rejected break, continue and return expressions. Feels like they would work well here.

333fred commented 4 years ago

@orthoxerox thanks, fixed.

@svick whether continue will be allowed is still an open question, we need to decide whether we'll allow any control flow out of the expression other than a break expression statement. As @HaloFour points out, Java does not allow these, and I'd be lying through my teeth here if I said we weren't inspired by their solutions to enhancing their switch statement here. But there is existing precedent for break referring to a different statement than continue, and while the compiler could figure it out, I'm not convinced that it wouldn't be confusing for the reader yet.

BreyerW commented 4 years ago

I would like to say that break as sort of return statement is very confusing. yield is much better imo.

CyrusNajmabadi commented 4 years ago

Since we're bike-shedding, break makes perfect sense to me. It's always been associated with leaving control of the switch. Having it leave with a value is totally sensible given the expression-nature of switch-expressions.

BreyerW commented 4 years ago

The difference is that break never returned value just broke current control, now it will. While yield USUALLY return value and in case you wanted to leave control without returning you have to state so explicitly like yield break; or yield return null; (they arent equivalent of course but intent is similar) thats why yield makes more sense.

No wonder Java changed their syntax in the middle of process.

qrli commented 4 years ago

For break expression, I think a larger picture need to be considered. If we compare to other functional languages like F#, match/switch is not the only use case. It is also applicable for if-else, etc.

Infuture, we may also want to write in C#:

var foo = if (condition) { bar(); break 1; } else { break 2; }

Then the break would look weird and confusing.

mpawelski commented 4 years ago

I really would like to have something like block expression from Rust.

I didn't play with Rust yet but it looks like good syntax for more "expression-oriented" language with C-style curly braces. And I definitely would like for C# to go into this direction.

With it we could later introduce "if expression" like @qrli suggested:

var foo = if (condition) { bar(); 1 } else { 2 }

or could write multi line lambda expression

collection.Where(a => { DoSomething(); a > 1}

It might look that simply omitting ; is too "terse" syntax and it might be better to have something more explicit, but I read that Rust programmers don't have any problem with it. Maybe someone can share his experience with it.

But this block expression is orthogonal feature. The proposed break syntax might still be valuable if we would like to "return" value from some nested blocks (like we do today with return in methods and lambdas). But is it worth it to introduce new feature just for small convenience that could be used only in switch expression if we would have something like rust like block expression which would handle 90% of use cases ("execute multiple things in a switch-expression arm before returning a value")

gafter commented 4 years ago

@mpawelski Block expressions ala Rust are currently under consideration under #377, though with parentheses instead of curly braces.

dersia commented 4 years ago

If this is still being discussed, I'd like to suggest using out instead of break or any other meaningful keyword that can be mistaken.

I explained why I think it is the better solution in the expression-block issue https://github.com/dotnet/csharplang/issues/3086#issuecomment-632601537.

Having

var x = y switch {
    < 0 => 100;
    < 10 => { 
                     var z = GetMeassures();
                     out z;
       } ;
    _ => 0;
};

Feels much better than

var x = y switch {
    < 0 => 100;
    < 10 => { 
                     var z = GetMeassures();
                     break z;
       } ;
    _ => 0;
};

Any thoughts?

ziaulhasanhamim commented 2 years ago

Any updates on this?

333fred commented 2 years ago

No, there are no updates on this.

ziaulhasanhamim commented 2 years ago

Will it ever gonna make its way around? I just don't like the switch statement syntax. But many times I have to use it because switch expression can't have multiple lines. So either use switch statement or create separate methods for every case of switch expression. This is a much needed feature. Why is it taking so long?

mrwensveen commented 1 year ago

Instead of yield, why not just return? This makes more sense to me because similarly to a lambda function the right side of the arrow always returns a value. I.e., this is equivalent:

var x1 = () => { return 10; };
var x2 = () => 10;

I think this would make sense:

var s = "tenable";
var i = s switch {
  "tenable" => 10,
  _ => {
    if (Sun.IsShining) return 100:
    return 0;
  }
};

Analyzers will pick up the unnecessary verbosity and simplify to _ => Sun.IsShining ? 100 : 0, but that's besides the point.

HaloFour commented 1 year ago

Instead of yield, why not just return?

That would interfere with allowing statement expressions to contain return statements, or if return expressions (#176) are to be considered, as it changes how the flow control would work. It could also easily lead to a subtle bug when refactoring between switch statements and switch expressions.

mrwensveen commented 1 year ago

"It could also easily lead to a subtle bug when refactoring between switch statements and switch expressions."

I think it's unfortunate that they're both use swith, I would have preferred match, but in any case you should always be careful when refactoring.

I get that return usually exits from the current method or function, but it is also allowed in lambdas, which are expressions.

If return is off the table, yield return x feels better than break x, IMHO.

BreyerW commented 1 year ago

If return is off the table, yield return x feels better than break x, IMHO.

I think this is off table too because yield return is already valid in iterators which means if you used iterators and expression block in them there would be ambiguity. Its why just yield and few others were suggested as they dont have this potential issue

mrwensveen commented 1 year ago

I don't see the issue, because scope already resolves this. This is perfectly valid and unambiguous:

string SillyString()
{
    IEnumerable<string> Iter() { yield return "Hello World"; }
    string Inner() { return string.Join(", ", Iter()); }
    var fn = () => { return Inner(); };

    return "Silly" switch { _ => new Func<string>(() => { return fn(); })() };
}

That's a lot of returns in one method, and they're al hit in this example. But no problem for C# because scopes.

For the purpose of this proposal it means that a code block after the arrow of a pattern creates its own scope, like a lambda function but slightly different.

377 and #3086 do something similar, possibly making this a moot discussion.

BreyerW commented 1 year ago

@mrwensveen

Thats not the situation i had in mind let me rehash your example a bit:

    IEnumerable<string> Iter() {
 yield return "Hello World"; 
var stringified = myObjects switch {
        List<string> strings => string.Join(strings, ","),
        List<MyType> others => {
            string result = string.Empty;
            foreach (var other in others)
            {
                if (other.IsFaulted) return;
                else if (other.IsLastItem) break; // This breaks the foreach, not the switch

                result += other.ToString();
            }

            yield return result;
        },
        _ => {
            var message = $"Unexpected type {myObjects.GetType()}"
            Logger.Error(message);
            throw new InvalidOperationException(message);
        }
    };
}

Now 2nd yield return should break only out of switch or out of Iter? Going by current rules it should be out of Iter but by expression-block rules it should out of switch and while this ambiguity is resolvable by scope it makes yield return specifically nonideal candidate since you cant tell this at a glance without parsing scopes first

mrwensveen commented 1 year ago

I get what you're trying to say. I wouldn't have a problem with this example, except for the return on the line with other.IsFaulted. The switch expression evaluates to whatever you accumulated in result and is assigned to stringified. You could even write yield return myObjects switch { ... and use yield return inside of the switch itself.

I would actually prefer a normal return without yield, because it seems unnecessary. This makes the switch expression like a pattern matching lambda (where it's already legal to use return). In the example above, the return on the line with other.IsFaulted would not compile because you're trying to assign void to a variable.

Okay, last attempt, I promise! What about break return? You'd be close to the nomenclature users are expecting when they see switch, but you're also explicitly stating that you are returning a value.

var foo = myObject switch {
  string s => s,
  MyType mt => {
    var bob = mt.Bob;
    break return ConvertToString(bob);
  },
  _ => throw new Exception("Invalid object!")
}

This way, when you see yield, you know you're dealing with iterators, when you see break, you know you're dealing with switches, and when you see a naked return, you know you're dealing with functions (possibly local or lambda).

dersia commented 1 year ago

Just jumping back in here. I think the problem with break, yield return and return is, that it might get mistaken (you forget the break before return etc) and is really hard to spot on code reviews.

I'd like to bring up my suggestion from way earlier in this discussion. out. Why not using the keyword out that is know but nether used insidr a body. This is clear about what is happening here and also it is easily to spot in code. So the example from above would look like this:

var foo = myObject switch {
  string s => s,
  MyType mt => {
    var bob = mt.Bob;
    out ConvertToString(bob);
  },
  _ => throw new Exception("Invalid object!")
}

I still thing this is the best keyword to use and does not interfere with return and break, that might be use to exit a loop or a method from within.

This way we could also use yield return for a method that is returning an IEnumerable and don't have to do double yield returns.

marchy commented 1 year ago

Here's a current workaround using lambda helpers to replace this:

switch( authIdentity ){
case AuthIdentity.PhoneNumberIdentity phoneNumberIdentity:
    installation.SetPreAuthContextPhoneNumber( phoneNumberIdentity.PhoneNumber );
    break;

case AuthIdentity.MessengerIdentity messengerIdentity:
    installation.SetPreAuthContextMessengerPageScopedID( messengerIdentity.PageScopedID );                  
    break;

default:
    throw new NotSupportedException( $"Unknown auth identity {typeof(AuthIdentity)}" ),
}

with this:

object _ = authIdentity switch {
    AuthIdentity.PhoneNumberIdentity phoneNumberIdentity => Do( () => {
        installation.SetPreAuthContextPhoneNumber( phoneNumberIdentity.PhoneNumber );
    }),
    AuthIdentity.MessengerIdentity messengerIdentity => Do( () => {
        installation.SetPreAuthContextMessengerPageScopedID( messengerIdentity.PageScopedID );
    }),
    _ => throw new NotSupportedException( $"Unknown auth identity {typeof(AuthIdentity)}" ),
}

This uses the following lambda helper (which is useful in all sorts of other contexts):

/// <summary>
/// Performs the given action, returning the 'no-op' result (fundamental C# limitation).
/// </summary>
/// <param name="action"></param>
/// <returns>NOTE: This object represents 'Void' – containing a "no result' result</returns>
[DebuggerStepThrough]
public static /*void*/object Do( Action action ){
    action();
    return new object();
}

There is some ugliness needed with the extra object _ = because C# doesn't let the switch expression get invoked without assigning it to an object (ie: purely for the side-effects) - which is another annoying/unnecessary limitation.

It would be fantastic having this as out-of-box language support. It's not only used for multi-line statements, but even single-line statements that invoke different logic/methods, as shown in the example.