dotnet / csharplang

The official repo for the design of the C# programming language
11.38k stars 1.02k forks source link

Discussion: Allow omission of parens and requiring block #1090

Open alrz opened 6 years ago

alrz commented 6 years ago

Among ideas mentioned in https://github.com/dotnet/csharplang/issues/882, I think "allowing omission of parens and requiring block" has merits that I want to discuss here.

This would be solely a parser feature for all applicable statements and would not require any new syntaxes. So it would require the least amount of changes covering most of the syntaxes in question. Rust and Swift embraced this style and it has never been brought up as a pain point; even in C# we always use blocks as a standard convention.

if expr { } if !(e is T) { }
foreach var item in list { }
for int i = 0; i < array.Length; i++ { }
while expr { }
do { } while expr;
switch expr { }
var x = match expr { };

This is a tuple-friendly syntax where we are switching over a tuple literal directly,


switch (a, b) { // instead of switch ((a, b))
  case (1, 2):
  case (2, 1):
    break;
}

Alternatives

Other options from https://github.com/dotnet/csharplang/issues/882#issuecomment-328747639 have their own trade-offs:

A negated if statement: if!, if not or unless.

Would be limited to if statements and require a new syntax.

A negated pattern, e.g. if (obj is not T t).

Would be limited to patterns, but is useful as a recursive pattern on constants (https://github.com/dotnet/csharplang/issues/246). I think it will be useful regardless of what we're going to do in https://github.com/dotnet/csharplang/issues/882.

A negated is operator, e.g. if (obj isnot T t).

Would be limited to is operators.

A new statement like guard with a block that has an unreachable end.

Would be a new syntax outright and does not cover cases where we need else.

A new operator like not with lower precedence, allowing if (not obj is T).

The preferable choice if we just want to reduce the noise in negating expressions but it adds another way to do something that already is possible.

stepanbenes commented 6 years ago

It would be ambiguous. For example

if true == new bool() { } { Console.WriteLine(""); }

{ } could be object initializer or a block corresponding to if.

alrz commented 6 years ago

@stepanbenes Not if we parse the bracket eagerly as part of the object creation and then expect a block.

Another example is nullable types, e.g. if (obj is bool? .. ) is not ambiguous because we parse ? eagerly as part of the type rather than expecting a ternary (though there is a bit of look ahead in that case).

stepanbenes commented 6 years ago

Ok. In that case it should work. I can't think of another example other than the empty block following the object initialization.

But, when I see the for loop without parens, it looks so weird, that it would be better to deprecate it entirely. :)

aluanhaddad commented 6 years ago

I was going to say that I liked this idea, and I do, as it would cut down on clutter, but if we simply had the match (or switch or whatever it is called) expression that was previewed), I would use that for probably 80% of control flow anyway.

alrz commented 6 years ago

@stepanbenes

when I see the for loop without parens, it looks so weird, that it would be better to deprecate it entirely. :)

Swift used the exact same syntax but it was removed in favor of ranges.

@aluanhaddad

Note that this only applies to match expressions only if we use switch-like syntax (https://github.com/dotnet/csharplang/issues/1054). Otherwise it just affects existing statements.

alrz commented 6 years ago

@stakx @MKazemAkhgary please voice your concerns, this is a discussion thread. :)

aluanhaddad commented 6 years ago

@alrz sorry, I was just complaining about the lack of an expression form for pattern matching. You are correct. Also, thanks for linking that issue.

Anyway 👍 for this idea.

MkazemAkhgary commented 6 years ago

This doesn't offer too much value to c# syntax. it also reduces readability.

Parenthesis improves readability because it shows exact boundary of statement. It may not be a problem to the compiler, but for person who reads code, it kind of matters. when you are reading a lot of code, and you are reading it fast, it matters. you can track it better.

its like saying in English writing we could omit punctuations, yes you could, but you reduce readability, in my opinion they play important rule on how fast you can read and understand statements.

the real horror shows up when you mix this with query expression.

My reasoning may not be convincing to you, (that's what I could express in English 😸).

Having that said, in some scenarios when expression is little, it may look pretty to remove parenthesis, but not in general.

alrz commented 6 years ago

@MkazemAkhgary

Note that this is derived from #568 (that later championed in #882) where the motivating example was negating a pattern matching is operator to remove a pair of parentheses e.g.

if !(foo is Bar bar) // instead of `if(!(foo is Bar bar))`
{
  return;
}

This proposal, on the other hand, would bring the same syntax but in a general manner and not a new one which would be limited to a single operator: logical not.

stakx commented 6 years ago

@alrz - While I agree that negating the is operator is a bit of a pain and the resulting (!( doesn't look too nice, I don't think this alone warrants such a radical change to the C# language's fundamental syntax, look and feel that people have become familiar with over nearly 2 decades. You could just as well call it a different language.

aluanhaddad commented 6 years ago

Nothing would prevent you from using (expression) and enforcing it. Many people are adamant that all subexpressions be surrounded by parents even when precedence makes this redundant.

Personally, if a conditional is complex, I would either extract its constituents into locals or break it across multiple lines but the actual ( ) around the if are frankly noise in the case of a simple conditional.

Comparing

if (source is IReadOnlyList<T> list)
{
    return list.Count;
}
else if (source is IEnumerable<T> s)
{
    return s.Count();
}
else
{
    throw new ArgumentException(nameof(source), "cannot determine Count");
}

to

if source is IReadOnlyList<T> list
{
    return list.Count;
}
else if source is IEnumerable<T> enumerable
{
    return enumerable.Count();
}
else
{
    throw new ArgumentException(nameof(source), "cannot determine Count");
}

I think the second is cleaner.

DavidArno commented 6 years ago

@aluanhaddad,

I agree that the latter is more readable.

However,

return source match (
    IReadOnlyList<T> list => list.Count,
    IEnumerable<T> enumerable => enumerable.Count(),
    _ => throw new ArgumentException(nameof(source), "cannot determine Count")
);

is what I really want. Roll on more pattern matching features 😁

Vlad-Stryapko commented 6 years ago

@aluanhaddad

For me the second version is much harder to read and I'm automatically looking for parentheses to kind of 'support' me in parsing this. That might be something to get used to, but I find existing syntax pretty expressive and very easy read, whilst your example makes me feel lost and confused even though it's very simple. I'm pretty much on @MkazemAkhgary's side here.

On the other hand, pattern matching version looks pretty good, but once again it might be a matter of habit. I've never worked with any language which allows one to omit parentheses and I've worked with patterns before.

jnm2 commented 6 years ago

@aluanhaddad Your second example forces me to read all the words left to right instead of letting me recognize that it's an if...else if structure and scan vertically to take in the cases.

Probably because I'm not used to it.

alrz commented 6 years ago

Just to mention, both Swift and Rust formatters puts the open bracket on the same line by default.

if source is IReadOnlyList<T> list {
    return list.Count;
}

helps to "frame" each statement.

aluanhaddad commented 6 years ago

@DavidArno yeah! That is why I initially had reservations about this proposal, I don't want anything to take time away from bringing match back, introducing user defined patterns, conditional patterns, and what have you.

HaloFour commented 6 years ago

I don't think it's worth it. It looks drastically different from the C# that most people know and requires a different mental model to parse without the help of the rails that the extra punctuation provides. It also does create a few ambiguous scenarios which are difficult to reason regardless of whether or not the compiler can easily resolve.

alrz commented 6 years ago

I think concerns like "in some scenarios when the expression is little or long" or "just where we have a (!(" could be easily addressed with a code style option since it would be impractical to encode "sometimes" in a language. Instead, I think it would be better to relax the requirement over all statements (instead of just a negated if) and provide an option for programmers to choose where they want to be old-fashioned.

jnm2 commented 6 years ago

provide an option for programmers to choose where they want to be old-fashioned.

Or as others might put it, "where they want to be C# programmers." 😈

alrz commented 6 years ago

@jnm2 You might say that about var as well, because god forbid, this is not JavaScript.

I think gafter's post is at play here but in a reversed manner. This is actually very different from what we have, but it could become the default after some time. Swift and Rust proved my point here, at least. 😌

HaloFour commented 6 years ago

I take issue with this notion that requiring parenthesis makes the syntax somehow "old-fashioned". You might mean that in the sense that some newer languages are designing their syntax specifically to make those parenthesis optional or omitted. But so did many languages much older than C# or C. That's not particularly relevant.

What's important is whether that syntax makes sense here, in an existing language, where such considerations were not made and the rest of the spec assumes that said punctuation belongs there. The block vs. initializer ambiguity highlights this. There's currently no ambiguity and the parser knows exactly what it hit at that first open brace. But that wouldn't be the case anymore and it would have to parse down two separate paths until it hit syntax that happened to not be legal. A single accidentally omitted semicolon could completely change the semantics of the program. And that's not even considering the mental burden put on the developer to reason that flow out.

In my mind this is about the same as arguing that C# should get rid of semicolons.

aluanhaddad commented 6 years ago

Well, I think the curly-brackets should be required. But I don't think it is about syntax being old fashioned in the context of trends in language design but rather that whenever a new feature is introduced within a language, it creates a divide. Old-fashioned would just be a style that avoids the new feature.

var is a good example. I still argue with people about it.

jnm2 commented 6 years ago

In my mind this is about the same as arguing that C# should get rid of semicolons.

Exactly. This is not like var as @alrz suggested. Also, imagine the impact this would have on whether it would be possible and how dangerous it would be to introduce new contextual keywords.

stepanbenes commented 6 years ago
var X = 3;
if p == new Point() { X = 4 } { Console.WriteLine(X); }

It is not ambiguous, but adding a single semicolon after 4 completely changes the semantics.

alrz commented 6 years ago

@HaloFour

In my mind this is about the same as arguing that C# should get rid of semicolons.

No, C# should get rid of empty-statement which is a semicolon.

That's already responsible for your other point:

A single accidentally omitted semicolon could completely change the semantics

e.g. while (e) ; { } which I think R# warns about.

it would have to parse down two separate paths until it hit syntax that happened to not be legal.

Actually this one is more straightforward than o is bool? example. To parse that, we must look ahead to see if the following token is a possible start of an expression, if so, we have a ternary. but for o == new bool() {} {} we just continue to parse the object-creation-expression, and then we expect a block.

What's important is whether that syntax makes sense here, in an existing language, where such considerations were not made and the rest of the spec assumes that said punctuation belongs there.

Just to mentioned, In Swift, before they get rid of C-style for loops, it had explicitly supported both kinds, with and without parens (which wouldn't have been a sideeffect of a parenthesized expression). I think the reason was exactly what you're claiming to be the issue here: to be backward-compatible with C-style mental model. So adding this in an existing language that is already C-style-friendly doesn't change that.

I take issue with this notion that requiring parenthesis makes the syntax somehow "old-fashioned"

I meant just like var you might choose to use the explicit type where it's not evident on the right-hand-side. Here you could use parentheses if it helps to clarify things e.g. the more explicit way.

alrz commented 6 years ago

@stepanbenes

I'm with you there, and I think it would worth to make it a warning to use object initializers in that context (probably as an IDE diagnostic). However, there will be some edge cases whenever you add features to an existing language, so I don't think we can take it as a blocker.

aluanhaddad commented 6 years ago

@stepanbenes

if p == new Point() { X = 4 } { Console.WriteLine(X); }

I find this problem compelling.

I'm starting to swing back around to the viewpoint that this isn't worth it. I don't use ReSharper and I don't like the idea that object initializers are ambiguous especially since you might be creating an object that implicitly converts to a pattern or maybe the language someday provide the oft requested factory method initializers.

stepanbenes commented 6 years ago

@alzr Yes, object initializers used in if condition are quite rare, but assuming property patterns will be added, this could be quite common:

if p is Point {  } { Console.WriteLine(p); }
HaloFour commented 6 years ago

@alrz

No, C# should get rid of empty-statement which is a semicolon.

You're right, the empty statement is pretty nasty language baggage which rarely has a proper use case and can easily accidentally result in bugs. I'd recommend not adding more language features with the same problem.

To parse that, we must look ahead to see if the following token is a possible start of an expression, if so, we have a ternary. but for o == new bool() {} {} we just continue to parse the object-creation-expression, and then we expect a block.

Which makes little sense to me. A block is significantly more common than an initializer in that context, so for the compiler to assume that it'll be an initializer and force a double-set of curly brackets just seems like syntactic noise. Ultimately I think most developers would just end up wrapping the whole thing in parenthesis just to avoid that mess.

I think the reason was exactly what you're claiming to be the issue here: to be backward-compatible with C-style mental model. So adding this in an existing language that is already C-style-friendly doesn't change that.

It's not about a C-style mental model. It's about the C#-style mental model, which is already 15 years and 7 versions in. If C# were in initial design I think it'd be perfectly reasonable to have these kinds of discussions because all future language evolution would then happen in the context of that grammar.

CyrusNajmabadi commented 6 years ago

To parse that, we must look ahead to see if the following token is a possible start of an expression

I wanted to chime in on this topic because i have lot of experience in this area. Both in prototyping proposals like this for C#, and for having to deal with issues related to this in JS/TS (where i also wrote the parser for MS' impl). One thing I don't see mentioned here is how many problems this may cause during the editing experience itself. Specifically, i see a lot of talk about how this can be done in an unambiguous manner due to speculative parsing. However, such approaches are only really approaching what needs to be done when running the compiler from the command line with completely written code.

To make this more concrete, consider the following:

if x == new Y () {
    Z

Importantly, note that the source code is not complete, and there isn't enough information to know what the user's true intent was. Also note that this is routinely possible in TS (and was an absolute PITA) where { ... } could be:

  1. A object-type annotation, like { X: type } Here 'X' specifies a property that has type 'type'.
  2. An object literal, like { X: type }. Here this is an actual shorthand for creating an object (not a type), that has a property called 'X' whose value is whatever 'type' resolves to.
  3. A block, like { X: type }. Here this is a block that has a labeled statement where the label is 'X' and the statement to execute is 'type' (an expression statement with an implicit semicolon, yaay js/ts

Note that lexically these are the exact same sequence of tokens. And, of course, during parse time, you don't have semantics to guide you. Now think about how the above can become far more complex, and can appear in all sorts of contexts as the user writes code or edits existing code.

Trying to make tooling work in these situations is not only difficult, it also results in very poor and confusing experiences for the end user when they run into this.

And, of course, in practice things are much more complex because these are not just small 5-token sequences you need to examine, but are instead enormously complex constructs that are very difficult to distinguish precisely because the language is so ambiguous, and because there may be other mistakes the user has made or the code may just be too incomplete to understand properly.

--

So, while i'm not opposed to the idea of simplification and less clutter in the language, know that these sorts of approaches can and do lead to all sorts of additional costs down the line. It's very possible (and, given our history, probable) that addressing those costs won't be possible*. This means that while you may get a language feature that makes the final* code feel less cluttered to you, you may only get there in a manner that makes the experience of actually working with the code feel worse.

** C# has had numerous ambiguity bugs that have been reported over the years that we've been unable to find low-cost ways to handle. And typescript had even more. I've often gone in to see if there cheap ways to address some of them. And there has been success in many cases. But those things i've done have also led to regressions, and there are still many cases out there (even in our language which tries to avoid ambiguity) that are problematic.

Embracing more ambiguities, and hoping it will be ok because the parser can just look at the entire file, misses the core scenarios around how the experience feels in the case of errors, which is the 99% case when code is being edited.

DavidArno commented 6 years ago

@CyrusNajmabadi,

Your comment is very well argued. 👍 But reading it is oh so disappointing, as I really like this proposal and so want you to be wrong! 😀 Oh well, on to other ideas. Thanks for taking the time to explain to us why this couldn't work.

CyrusNajmabadi commented 6 years ago

Thanks for taking the time to explain to us why this couldn't work.

Just to be clear. That's not my argument :) My argument is that people should be aware of the potential impact of these sorts of decisions on the overall tooling experience. i.e. it's not enough to go through the exercise of saying "can the compiler distinguish these cases in complete code". Instead, we have to consider "and what does this mean for the user trying to actually edit and maintain these constructs in their application".

Overall, i'm a fan of reducing excessive ceremony in code. Heck, i was one of the original people to propose "why not just remove parens from 'if'" :) . But, after examining what would be involved, and seeing the types of costs this would incur on the team, i've moved into the "it's just not worth it" camp. i.e. while nice to have at a language level, it's just not paying for itself vs what we'd need to do to support this.

--

Note: this does not mean we can't support if !(...). That code always has a clear end delimiter, and so you can have this nice construct, without the awfulness of if (!(...)). That's the most important scenario for me, so i'm ok with not having a general if no_paren_expr construct.

alrz commented 6 years ago

this does not mean we can't support if !(...). That code always has a clear end delimiter, and so you can have this nice construct, without the awfulness of if (!(...)). That's the most important scenario for me, so i'm ok with not having a general if no_paren_expr construct.

Ok if that's the case, I would suggest the last alternative instead: a not operator with a lower precedence,

if (not obj is null) return;
if (not obj is Foo foo) return;
// etc

This wouldn't be coupled with if statements only, at least, which I very dislike about the proposed if!.

(or even not patterns, pick any alternative here but I think if! is not the ideal solution in any ways).

CyrusNajmabadi commented 6 years ago

I very dislike about the proposed if!.

Syntax is polarizing. We never make any syntax changes that appeal to everyone. Heck, i still hear complaints about generics and how some people don't like <>'s. At some point you just live with it. We have a simple and cheap syntactic change we can make that, in a sense, feels like removing something from the language, instead of adding to it. We can just make a common 'if' construct a bit terser. That's a lot easier and simpler to do than adding a new construct (like "not"). Especially because if you do that, then you have to now add support for this everywhere. And you have to deal with all the rules around contextual keywords. and it's a PITA. For example, do you literally only allow this for 'if' statements? If so, that's very weird. So you add it for non-ifs. but now you have to make weird grammar rules to understand what's going on. i.e. "not(foo)", is that a negation? or a call to "not". It's like 'nameof', except that 'nameof' solves a problem, whereas 'not' is just a synonym for something we already have (!).

Joe4evr commented 6 years ago

This wouldn't be coupled with if statements only, at least

Is this even explicitly stated by the team? I was under the impression that while !(expr) would work just as well.

CyrusNajmabadi commented 6 years ago

I imagine it will be supported for the conditional-statements. Basically "if, while, do-while". But i don't think it would be beyond that.

alrz commented 6 years ago

we never make any syntax changes that appeal to everyone.

It's not just about the syntax. This very proposal would have the same lexical looks, too. But rather, about the functionality e.g. what we can do with it. if! just makes it easier to "negate an expression inside if statements" that's a pretty narrow use case to warrant a new feature IMO. I agree that it does not suffer from ambiguities caused by a narrowed context, but at the same time, it's a whole feature for a single use case.

Especially because if you do that, then you have to now add support for this everywhere.

I didn't directly favor not operator here, just exploring alternatives (which would be equivalent to if!). If we could take time to ponder where we want to invest, I think it would worth to just consider alternatives. as a matter of fact, not patterns (https://github.com/dotnet/csharplang/issues/246) would be useful in some other places as well and are narrowed to patterns so we don't have to deal with all kinds of ambiguities.

It's like 'nameof', except that 'nameof' solves a problem, whereas 'not' is just a synonym for something we already have (!).

I mentioned this in the opening post. I see that as a trade-off too :)

alrz commented 6 years ago
var X = 3;
if p == new Point() { X = 4 } { Console.WriteLine(X); }
if true == new bool() { } { Console.WriteLine(""); }

I think the compiler can produce an error/warning when a trailing {} appears directly inside the expression and suggest to use parens to disambiguate. As a consequence, we still can use if !() syntax, but it would not be limited to just that case. While this hasn't been a popular proposal I think it's a better choice over alternatives. With recursive patterns, I expect we will see more code like this:

switch ((x, y))
{
  case (1, 2): break;
}

which could be avoided if we make parens optional and give a meaningful message in suspicious cases.