Proposal: Expression blocks

cston commented 4 years ago

Proposal

Allow a block of statements with a trailing expression as an expression.

Syntax

expression
    : non_assignment_expression
    | assignment
    ;

non_assignment_expression
    : conditional_expression
    | lambda_expression
    | query_expression
    | block_expression
    ;

block_expression
    : '{' statement+ expression '}'
    ;

Examples:

x = { ; 1 };  // expression block
x = { {} 2 }; // expression block

y = new MyCollection[]
  {
      { F(), 3 }, // collection initializer
      { F(); 4 }, // expression block
  };

f = () => { F(); G(); }; // block body
f = () => { F(); G() };  // expression body

Execution

An expression block is executed by transferring control to the first statement. When and if control reaches the end of a statement, control is transferred to the next statement. When and if control reaches the end of the last statement, the trailing expression is evaluated and the result left on the evaluation stack.

The evaluation stack may not be empty at the beginning of the expression block so control cannot enter the block other than at the first statement. Control cannot leave the block other than after the trailing expression unless an exception is thrown executing the statements or the expression.

Restrictions

return, yield break, yield return are not allowed in the expression block statements.

break and continue may be used only in nested loops or switch statements.

goto may be used to jump to other statements within the expression block but not to statements outside the block.

out variable declarations in the statements or expression are scoped to the expression block.

using expr; may be used in the statements. The implicit try / finally surrounds the remaining statements and the trailing expression so Dispose() is invoked after evaluating the trailing expression.

Expression trees cannot contain block expressions.

In terms of impl, this will be a shockingly easy mistake to make (i do it all the time myself). We shoudl def invest in catching this and giving a good message to let people know what the problem is and how to fix it. i.e. if we detect not enough expr args, oing in and seeing if replacing with a semicolon with a comma would fix things and pointing peoplt to that as the problem.

CyrusNajmabadi commented 4 years ago

Control cannot leave the block other than after the trailing expression unless an exception is thrown executing the statements or the expression.

Is this for ease of impl, or is there a really important reason this doesn't work at the language level? for example, i don't really see any issues with continuing (to a containing loop) midway through one of these block-exprs.

YairHalberstadt commented 4 years ago

I also don't see the reasons for any of the restrictions TBH, other than expression trees.

cston commented 4 years ago

Control cannot leave the block other than after the trailing expression unless an exception is thrown executing the statements or the expression.

Is this for ease of impl, or is there a really important reason this doesn't work at the language level.

The evaluation stack may not be empty at the continue.

int sum = 0;
foreach (int item in items)
{
    sum = sum + { if (item < 3) continue; item };
}

CyrusNajmabadi commented 4 years ago

The evaluation stack may not be empty at the continue.

Riht... but why would i care (as a user)? From a semantics perpective, it just means: throw away everything done so far and go back to the for-loop.

I can get that this could be complex in terms of impl. If so, that's fine as a reason. But in terms of hte language/semantics for the user, i dont' really see an issue.

orthoxerox commented 4 years ago

@CyrusNajmabadi as a user I find the example by @cston hard to grok. Yanking the whole conditional statement out of the expression block makes everything MUCH clearer. Do you have a counterexample where return, break or continue work better inside an expression block?

CyrusNajmabadi commented 4 years ago

In terms of impl, we should look at the work done in TS here. in TS { can start a block, or it can start an object-expr. Because of this, it's really easy to end up with bad parsing as users are in the middle of typing. It important from an impl perspective to do the appropriate lookahead to understand if something should really be thought of as an expression versus a block.

CyrusNajmabadi commented 4 years ago

Consider the following:

{ a; b; } ;

A block which executes two statements inside, with an empty statement following.

{ a; b };

An expression-statement, whose expression is a block expression, with a statement, then the evaluation of 'b'.

Would we allow a block to be the expression of an expr-statement? Seems a bit wonky and unhelpful to me (since the value of hte block expression would be thrown away).

Should we only allow block expressions in the case where the value will be used?

jcouv commented 4 years ago

@cston To avoid the look-ahead issue, I would suggest an alternative change:

block
  : '{' statement* expr '}'
  ;

This means that we always parse { ... as a block, even if it has a trailing expression. Then we can disallow in semantic layer.

I think this would solve the look-ahead issue for the compiler, but not so much for humans. I'd still favor @{, ${ or ({ to indicate this is an expression-block.

CyrusNajmabadi commented 4 years ago

${

yes. I'm very on board with a different (lightweight) sigil to indicate clearly that we have an expr block

HaloFour commented 4 years ago

How about ={ 😁

Joe4evr commented 4 years ago

I wonder if the ASP.NET team would lean their preference to @{ since that's already established for a statement block in Razor syntax. 🍝

spydacarnage commented 4 years ago

Isn't that a good reason not to use it, then, as it may cause parsing issues in a Razor/Blazor page?

On Tue, 7 Jan 2020 at 21:39, Joe4evr notifications@github.com wrote:

I wonder if the ASP.NET team would lean their preference to @{ since that's already established for a statement block in Razor syntax. 🍝

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dotnet/csharplang/issues/3086?email_source=notifications&email_token=ADIEDQLRWL7SSRWNJ7IZWSDQ4TY73A5CNFSM4KD5XJAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIKL6VY#issuecomment-571785047, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIEDQKNNDPOSBT2TKGFOMTQ4TY73ANCNFSM4KD5XJAA .

mikernet commented 4 years ago

This is kinda neat but the syntax definitely bothers me as being too subtle of a difference for block vs expression. I think having $ as a prefix is more sensible and easier to recognize when reading.

Trayani commented 4 years ago

I'm not bothered by the semicolon, but understand the potential confusion.

Also, if I undestand correctly, it will not be possible to simply relax the syntax and let the compiler decide whether the block is statement / expression due to lambda type inferrence. Correct?

MadsTorgersen commented 4 years ago

I think this is really promising, and a good starting point.

We've been circling around the possibility of being able to add statements inside expressions for many years. I like the direction of this proposal, because:

the {...} is recognizable from statement blocks. I know that curly braces are already somewhat overloaded, and there will be ambiguous contexts, but from a cognitive perspective I think it doesn't make the situation significantly worse, and is preferable to adding some new syntax for statement grouping.
It provides natural and easy-to-understand scoping for any variables declared inside, including those declared in the trailing expression (e.g. through out variables).

Within that, I think there are several design discussions for us to have:

Should the result be produced by a single expression at the end (as proposed here), or via a result-producing statement (e.g. break expr; has been proposed in #3037 and #3038)? In the latter case it would be syntactically equivalent to a block statement, and just have different semantic rules (just as the difference between a block used for a void-returning vs a result-returning method). The former may work best for shorter blocks, the latter for bigger ones. Which should we favor?
Is the proposed precedence right? This disallows any operators from being applied directly to the statement expression. That's probably good, but needs deliberation. It limits the granularity at which an expression can easily be replaced with a block (though of course you can always parenthesize it, like every other low-precedence expression).
Should a block expression be allowed as a statement expression? probably not!
The proposal requires there to be at least one statement. That's kind of ok if the statement block is used for prepending statements to your expression! But once it's in the language I can imagine wanting to use it just to scope variables declared in a single contained expression.
I don't like the proposals for prepending a character so that you can "tell the difference", but that's another discussion to have. I don't think anyone other than the compiler team wants to "tell the difference". 😁
There's a potential "slippery slope" argument to allow other statement forms as expressions somehow. I don't think that's very convincing, since such statements should just be put inside of a block expression! But I can see that coming up.
We should make sure we gather the important scenarios. I've heard two really convincing ones:
- as the branches of switch expressions (and switch statements if we do #3038). Switch expressions are themselves so complex that reorganizing the code to get a statement in becomes intrusive.
- as "let-expressions" where a temporary local variable (or function) is created just for the benefit of one expression.
3037 has examples of the former. An example of the latter might be:
```
var length = { var x = expr1; var y = expr2; Math.Sqrt(x*x + y*y); }
```

At the end of the day, this is the kind of feature that, even when we've done the best we can on designing it, it just doesn't feel right and we end up not doing it. Putting statements inside expressions may just fundamentally be too clunky to be useful.

HaloFour commented 4 years ago

Allow a block of statements with a trailing expression as an expression.

I'd love it if this were possible without requiring a modified syntax. Sure, I understand that this would change the meaning of existing code, but most of the time that change would be that a value is harmlessly discarded. I am aware of at least one situation where this could affect overload resolution for lambda expressions, are there others?

MgSam commented 4 years ago

If I'm understanding the proposal correctly this would feel very weird when used with expression-bodied members.

class A 
{
    int Foo() => 5; //Expression

    int Foo2() => { ; 5 } //Expression block?

    int Foo3() => { return 5; } //Not allowed
}

333fred commented 4 years ago

@MgSam that's what Mads is pointing out with "Should a block expression be allowed as a statement expression? probably not!"

YairHalberstadt commented 4 years ago

If statement expressions are added, and "Control cannot leave the block other than after the trailing expression", there's increased incentive to make conditionals more user friendly, so that it's easier for the result of a block expression to depend on a test.

I find deeply nested conditional expressions highly unreadable. This suggests that we should allow if-else expressions.

This also cuts the other way. With sequence expressions it's much easier to turn an if-else with multiple statements into an expression. All you have to do is remove the final semicolon

In scala and rust it's common for the entirety of a method to consist of a single expression consisting of multiple nested if-else expressions. I find this to be a really nice style.

0x000000EF commented 4 years ago

If I understand correctly the main motivation of this proposal is only switch statements #3038.

Really I don't see another value benefits from this, much more desirable for me it is something like with operator.

Consider slightly changed @MadsTorgersen example

var length =
{
    var (x, y) = (GetX(), GetY());
    Math.Sqrt(x*x + y*y);
}

much more clear and obvious for me

var (x, y) = (GetX(), GetY());
var length = Math.Sqrt(x*x + y*y);

or hide variables into functional scope

double CalculateDistance(double x, double y) => Math.Sqrt(x*x + y*y);
var length = CalculateDistance(GetX(), GetY());

So, from this point Expression blocks looks for me like a local function body without signature and parameters called immediately

double CalculateDistance()
{
    var (x, y) = (GetX(), GetY());
    return Math.Sqrt(x*x + y*y);
}
var length = CalculateDistance();

var length =
{
    var (x, y) = (GetX(), GetY());
    return Math.Sqrt(x*x + y*y); // it should contains explicit 'return'
}

But I am not sure that this is really important and value feature...

YairHalberstadt commented 4 years ago

@0x000000EF

The expression block can take place in a deeply nested expression, where converting it to a set of statements would require significant refactoring.

ronnygunawan commented 4 years ago

I think this would solve the look-ahead issue for the compiler, but not so much for humans. I'd still favor @{, ${ or ({ to indicate this is an expression-block.

I think { is good enough. We can always parenthesize it as ({ when needed.

If I understand correctly the main motivation of this proposal is only switch statements #3038.

Really I don't see another value benefits from this, much more desirable for me it is something like with operator.

Ternary operator and object initialization will benefit from this too.

var grid = new Grid {
    Children = {
        ({
            var b = new Button { Text = "Click me" };
            Grid.SetRow(b, 1);
            b
        })
    }
};

0x000000EF commented 4 years ago

@YairHalberstadt, can you provide an example?

@ronnygunawan, seems looks more clear...

Button CreateClickMeButton()
{
    var b = new Button { Text = "Click me" };
    Grid.SetRow(b, 1);
    return b;
}

var grid = new Grid {
    Children = {
        CreateClickMeButton()
    }
};

mikernet commented 4 years ago

@0x000000EF When building deeply nested UIs using code it is often desirable to have the elements declared right where they are in the tree, not split off somewhere else. It mirrors the equivalent XAML/HTML/etc more closely and it's easier to reason about the structure of the UI.

@MadsTorgersen

I don't think anyone other than the compiler team wants to "tell the difference"

I'm not sure what you mean by that. I think it's useful to be able to reason about the difference in behavior between...

f = () => { F(); G(); }; // block body
f = () => { F(); G() };  // expression body

...with something less subtle than just the absence of the semicolon, particularly if the proposal to implicitly type lamdas to Action/Func in the absence of other indicators gains traction. I guess the stylistic nature of the second example just feels a bit odd to me in the context of C# but maybe with time I'd get over that. A keyword before the trailing expression would solve that minor gripe as well but I'm not overly invested either way, just a suggestion to consider.

0x000000EF commented 4 years ago

@mikernet, it is not a big problem if we have something like with operator

static T With<T>(this T b, Action<T> with)
{
    with(b);
    return b;
}

var grid = new Grid {
    Children = {
        new Button { Text = "Click me" }.With(b => Grid.SetRow(b, 1))
    }
};

TonyValenti commented 4 years ago

I'm definitely not a fan of seeing something like ({ Foo(); 3;}). I really dislike the 3; part. I would much rather see something like:

@{
  Foo();
  return 3;
}

where the block expression basically looks and acts a lot more like a delegate/local function.

munael commented 4 years ago

For what it's worth, I'd prefer a variant of this over #377 (Sequence Expressions). Assuming its scoping behaves exactly as a normal block's.

Also one more vote for the (1) { *; break expr; } syntax instead of the (2) { *; expr } syntax. (2) looks somewhat more elegant, but it clashes with the style of the rest of the language. Does that makes sense?

DavidArno commented 4 years ago

@TonyValenti,

I really dislike the 3; part

Which goes to prove the "you can't please all of the people..." adage. That form, { Foo(); 3 } is exactly the selling point for me. Require a break or return or whatever and it's now a strange half statement block/ half inline function that can weirdly appear in an expression.

DavidArno commented 4 years ago

Not sure how the scoping of variables declared within an expression block would work, but supporting the following scenario seems a key requirement of this feature as it's one of the primary drivers for extending expressions in this way:

C Foo()
{
    return new C { 
        Prop1 = { var x = ExpensiveMethod(); x.P1 }, 
        Prop2 = x.P2 
    };
}

so x needs to "leak" out of its expression block into the initialiser.

DavidArno commented 4 years ago

@MgSam that's what Mads is pointing out with "Should a block expression be allowed as a statement expression? probably not!"
int Foo() => { ; 5} is var x ? x : 0;
😈

0x000000EF commented 4 years ago

@DavidArno , do you really prefer

C Foo()
{
    return new C { 
        Prop1 = {
            var x = ExpensiveMethod();
            x.P1
         }, 
        Prop2 = x.P2 
    };
}

instead

C Foo()
{
    var x = ExpensiveMethod();
    return new C { 
        Prop1 = x.P1,
        Prop2 = x.P2 
    };
}

? :)

DavidArno commented 4 years ago

@0x000000EF,

I actually prefer:

C Foo()
    => new C { 
        Prop1 = { var x = ExpensiveMethod(); x.P1 }, 
        Prop2 = x.P2 
    };

0x000000EF commented 4 years ago

@DavidArno , so, from the architecture points more clean solution looks like

Foo(CreateExpensiveObject());

C Foo(AnyExpensiveObject x)
    => new C { 
        Prop1 = x.P1, 
        Prop2 = x.P2 
    };

but if you want some dirty tricks already works fine

C Foo()
    => new C { 
        Prop1 = CreateExpensiveObject().To(out var x).P1, 
        Prop2 = x.P2 
    };

Can you provide more value cases for this feature? ;-)

DavidArno commented 4 years ago

Can you provide more value cases for this feature?

Not me. Initialisers is the single most compelling use case for expression blocks in my view. Without support for initialisers, then this becomes a non-event feature for me. You'll have to ask others for other use cases.

DavidArno commented 4 years ago

so, from the architecture points more clean solution looks like...

We clearly have very different ideas of a "more clean solution". 😕

0x000000EF commented 4 years ago

So, can anybody provide some examples and describe why this feature is really important for implementing?

At the current moment it looks for me like a not very clear attempt to compensate missing of something like with blocks.

Richiban commented 4 years ago

@DavidArno

so x needs to "leak" out of its expression block into the initialiser.

Nope. Nope nope nope.

If we want leaky variables, then I think it's be much better if we didn't use the braces {, but rather brackets ( for expression blocks.

I think it's quite a jarring addition to the language if this becomes the one and only meaning of braces in which variables can leak out.

333fred commented 4 years ago

@DavidArno see, we chose braces in this proposal explicitly because parens would imply scope leakage, and not wanting that is pretty much the only thing we're fully agreed on at this point. The entire point of these, in my mind, is locally scoped variables that do not leak. F#, for example, does not let bindings escape the scope they are defined in, and this would be the same.

Richiban commented 4 years ago

IMO, this example:

    return new C { 
            Prop1 = { var x = ExpensiveMethod(); x.P1 }, 
            Prop2 = x.P2 
        };

is much better written as:

    return { 
        var x = ExpensiveMethod();

        new C { 
            Prop1 = x.P1, 
            Prop2 = x.P2 
        }
    };

0x000000EF commented 4 years ago

@Richiban, hm, which sense to introduce

    return { 
        var x = ExpensiveMethod();

        new C { 
            Prop1 = x.P1, 
            Prop2 = x.P2 
        }
    };

if we can use simple

    var x = ExpensiveMethod();
    return new C { 
            Prop1 = x.P1, 
            Prop2 = x.P2 
        }

?

Richiban commented 4 years ago

@0x000000EF You've avoided the purpose of this proposal, which is to have the whole thing become a single expression.

masonwheeler commented 4 years ago

@Richiban

If we want leaky variables, then I think it's be much better if we didn't use the braces {, but rather brackets ( for expression blocks.

Brackets are [ ] those things. The round ones are parentheses.

0x000000EF commented 4 years ago

@Richiban, I don't understand your practical sense... Which benefits brings

var length = { var x = expr1; var y = expr2; Math.Sqrt(x*x + y*y); }

    return { 
        var x = ExpensiveMethod();
        new C { 
            Prop1 = x.P1, 
            Prop2 = x.P2 
        }
    };

vs

var x = expr1; var y = expr2; var length = Math.Sqrt(x*x + y*y);

    var x = ExpensiveMethod();
    return new C { 
            Prop1 = x.P1, 
            Prop2 = x.P2 
        }

Hidden variables x and y? This is a real problem which worsts it?

DavidArno commented 4 years ago

@masonwheeler,

In the US, maybe. But here in the UK, () are brackets and at a push might be described as round brackets. [] are always square brackets :)

Richiban commented 4 years ago

@DavidArno Thanks for having my back!

To me (us?) they're all different varieties of "bracket":

Symbol	Name
`( )`	(Round) brackets
`{ }`	Curly brackets
`[ ]`	Square brackets
`< >`	Angle brackets

where, as DavidArno says, the "round" in "round bracket" would normally be omitted.

The word parentheses is almost never used outside of English language studies, where it basically means "any punctuation that can be opened or closed".

DavidArno commented 4 years ago

@333fred,

If ({ var x = Foo(); x.Valid})
{
    // x is out of scope here!
}

This seems odd given that vars from patterns and out are in scope in the same situation.

333fred commented 4 years ago

This seems odd given that vars from patterns and out are in scope in the same situation.

@DavidArno but vars from patterns and out were not designed to introduce locally-scoped variables for some quick computation but not be visible to the general scope, unlike block expressions. This is literally the only thing we have consensus on at this point: bindings introduced inside a block scope are only visible inside that scope. Personally, I'm not a huge fan of general-purpose expression blocks at all, and think we'd be much better served only allowing them in one or two places (namely switch expression arms) and relying on local functions for other locally-scoped computation, but we'll be discussing this in more depth sometime next week.

CyrusNajmabadi commented 4 years ago

Personally, I'm not a huge fan of general-purpose expression blocks at all, and think we'd be much better served only allowing them in one or two places (namely switch expression arms)

I like this, and would be 'pro' on having this be how the feature is initially released. It could always be relaxed later.

HaloFour commented 4 years ago

My only concern there being that whatever design for expression blocks to work well with switch arms might look different if it was being designed for another feature or for general purpose use. For example, the suggested break keyword to yield the result from the expression block is likely inspired by the existing use of break in a switch statement. Would that keyword have been chosen if the feature was designed for use somewhere else? IMO this is likely why Java went with yield instead of break, apart from the confusion that break was causing in the preview.

dotnet / csharplang