dotnet / csharplang

The official repo for the design of the C# programming language
11.54k stars 1.03k forks source link

Switch expressions currently disallow trailing comma #2098

Closed gafter closed 5 years ago

gafter commented 5 years ago

Update: LDM (1/9) decided to allow such trailing comma.


@jcouv commented on Thu Dec 20 2018

Trailing commas are currently disallowed, which is inconvenient when copy/pasting a line.

public class C 
{
    public void M() 
    {
        _ = 3 switch 
        {
                2 => 1,
                _ => 2, // error
        };
    }
}

Example

In contrast, we allow trailing commas in array initializers.


@CyrusNajmabadi commented on Thu Dec 20 2018

A big :+1: on this. For anything that is commonly multi-line, not allowing the trailing comma is often a PITA. Thanks!


@gafter commented on Thu Dec 20 2018

I advocated for this but the LDM shot it down. I'll be happy to bring it back up based on "experience".


@CyrusNajmabadi commented on Thu Dec 20 2018

That would be great. Thanks @gafter .


@alrz commented on Thu Dec 20 2018

Hopefully also for property patterns which is an unordered list. šŸ‘


@sharwell commented on Thu Dec 20 2018

@gafter This will be super obvious to StyleCop Analyzers users if it turns out to be a "special" list (i.e. a list that does not allow a trailing comma). There's a rule for that.

DavidArno commented 5 years ago

Trailing commas are currently disallowed

Good. Long may that continue to be the case too.

Editing this to explain my thoughts:

I see no similarity between a multi-line switch expression and eg a collection initialiser or an enum. In the cases where trailing commas are currently allowed, it's when a list of values is being defined.

A trailing comma cannot be used with eg type definitions (class Foo : IFoo, { is not valid), it cannot be used with parameter lists and so forth. I see a switch expression sitting firmly in this "control" category, rather than in the "data" category of things like initialisers. So disallowing trailing commas in switch expressions makes complete sense to me as it maintains language consistency.

YairHalberstadt commented 5 years ago

A switch expression cannot ever get past the last case can it? Otherwise we would end up with an expression which doesn't always have a value.

In which case I think having a trailing comma is slightly misleading. It implies it's normal to add something after the last case, but actually you'd want to add it before the last case...

alrz commented 5 years ago

it cannot be used with parameter lists

I think we should notice that's true almost always when there's parentheses rather than braces.

Adding a parameter changes the overload, adding a tuple element changes the type.

But adding a switch arm, property pattern, property initializer or enum member doesn't change anything about the type on and by itself.

I think we should consider this even with parameter lists because of optional parameters, but that's another proposal.

alrz commented 5 years ago

A switch expression cannot ever get past the last case can it?

Switch expression doesn't necessarily end with _ case or alike. In an exhaustive switch expression (maybe on an enum without FlagsAttribute) you don't necessarily have a "default" arm, and the order wouldn't be important in these cases.

DavidArno commented 5 years ago

I think we should notice that's true almost always when there's parentheses rather than braces.

It's a great shame then that the team changed the syntax from using parentheses to braces then... šŸ¤·ā€ā™‚ļø

alrz commented 5 years ago

I think that was in response to what a switch expression arm is (number of arms doesn't change the type). which led us to the question of trailing commas which is another property of such lists.

I think the term for each of these is heterogeneous vs. homogeneous and I believe switch expression arms are of the latter kind.

qrli commented 5 years ago

this is why i'd prefer leading symbol (like case or F#'s |) only than trailing delimiter only, so that no need to debate on such thing which is minor but important in feeling to some.

CyrusNajmabadi commented 5 years ago

This has nothing to do with parens/braces/etc. This has to do with common ways of writing a list of elements and the continued annoyance around editing said lists. When you have multiple lines containing these members, it is an extra annoyance to have to have the last line always be different from the rest. It adds extra steps for the users for no extra benefit.

It's worth noting that other languages (like go) sidestep this entirely by actually making the trailing comma mandatory for similar situations. And common linters for other languages (like TypeScript) require you to put these in for consistency.

--

Basically, no one is helped by disallowing this comma. Whereas there are several patterns and editing scenarios that are helped by allowing it. These patterns match the existing cases in the language where this is already allowed, and so we're also more consistent if this is allowed here.

CyrusNajmabadi commented 5 years ago

It's a great shame then that the team changed the syntax from using parentheses to braces then...

Even with parens, this should be allowed. It will be common to write this as:

expr switch
{
    p1 => v1,
    p2 => v2,
}

Adding a new case should be as simple as duplicating the last line. The same as how simple it is to duplicate a line in any of our constructs today that allow trailing commas (like initializers). i.e. i can write this:

        new A
        {
            B = 1,
        };

And easily just add more items by copy/pasting, without having to go edit previous initializers as well. This is a common editing improvement, recognized among tons of languages, and which should be allowed in any circumstance where there is not risk of ambiguity. i.e. if i do this

new X<
   int,
   //...
   string,
>

// or similar cases with tuples

there is a visual ambiguity here because it can lead people to think there may be an unstated item there that may be inferred or may otherwise have meaning. In visual cases like that, or cases where the language might actually want a feature there in a future (like 'inferred type arguments'), then this shoudl not be allowed. However, for completely safe areas, it should practically always be ok.

DavidArno commented 5 years ago

This has nothing to do with parens/braces/etc. This has to do with common ways of writing a list of elements and the continued annoyance around editing said lists

Then fix the editor, rather than bodging the language...

CyrusNajmabadi commented 5 years ago

Then fix the editor,

There isn't one editor to fix. You've simply pushed the problem to everyone, instead of fixing at the natural location.

rather than bodging the language...

It doesn't do anything negative to the language. The language already supports this construct in many places precisely because it makes typing more pleasant. There was no need to have ever supported this in hte past for our other constructs. But we did, for this exact case. The language is 'bodged' (i'm going to assume i know what that means) when new constructs don't behave consistently like these other ones do just because you want to be zealotly religious about things that don't actually make life better for users.

sharwell commented 5 years ago

Failure to include a comma at the end of a multi-line construct for which additional lines are likely to be added in the future is an unnecessary maintenance cost. When adding a new item at the end of a list, the following problems occur if the previous item did not have a trailing comma:

  1. The pull request making the change will require reviewers look at a line which could have been excluded from the review.
  2. After the previous step, git blame will produce incorrect results, requiring users to step back in the blame to find the information which should have been available from the start.

Some users may not agree with this, but there is a large body of C# developers which count on it continuing.

alrz commented 5 years ago

Then fix the editor, rather than bodging the language

Same argument applies to $@ and @$, 8.0 supports both. bodged.

YairHalberstadt commented 5 years ago

Can you give a scenario where you are likely to add something to the end of a switch expression between one commit and the other?

alrz commented 5 years ago
enum E
{
   Case1,
   Case2,
+  Case3,
}

var x = e switch
{
   E.Case1 => 1,
   E.Case2 => 2,
+  E.Case3 => 3,
}

Assuming match on enum without Flags is considered exhaustive.

alrz commented 5 years ago

Relates to "sealed enums" (https://github.com/dotnet/csharplang/issues/782)

jcouv commented 5 years ago

Allowing trailing commas seems desirable. I've found them useful in enums and arrays already, and expect switch expressions will also benefit. The examples that jump to my mind in the Roslyn codebase are WellKnownTypes, WellKnownMembers and ErrorCode (which are built from enums and arrays that we regularly expand).

enum Enum
{
    Alice,
    Bob,
}
new[]
{
   1,
   2,
}
_ = e switch
{
     Alice => "Alice",
     Bob => "Bob",
     // I expect to add entries here too
}
theunrepentantgeek commented 5 years ago

Basically, no one is helped by disallowing this comma.

@CyrusNajmabadi, in this case I disagree with you.

From a cognitive perspective ...

When reading normal English prose (*), proper punctuation helps the brain process the text and parse its meaning. It's been shown that misspelled words slow readers by more than 10%; there's a similar slowdown caused by improper punctuation (but, unfortunately, I can't remember the figure).

(*) I suspect that this applies to other languages as well, but the studies I've read have focused on English.

I have to believe that the same would apply when reading code. Seeing a , is a signal to the reader that there's another clause following. When it's missing, that's a cognitive speed-bump that will slow down the reader.

Trailing commas make code measurably harder to read every time, for the benefit of eliminating a minor edit once.

From the principle of least surprise ...

Other languages permit trailing commas in many many places and they result in some very odd situations.

A friend shared this one from JavaScript:

> [1, 2, 3].length
3
> [1, 2, 3, ].length
3
> [1, 2, 3, , ].length
4

In any sane world, those arrays would have lengths 3, 4 and 5.

alrz commented 5 years ago

You can totally do new[]{ 1, 2, } and C# doesn't allow more than one trailing comma which is insignificant.

sharwell commented 5 years ago

Also note that it's easy to add an analyzer to forbid the , even if the language allows it, but impossible to add an analyzer that allows the , if the language doesn't.

CyrusNajmabadi commented 5 years ago

@CyrusNajmabadi, in this case I disagree with you.

You are taking my quote out of context. I specifically called out cases where there was semantic ambiguity. And arrays, type-arg lists, and tuples would definitely be cases like that. There is no such issue in the case being discussed here, and my statement was in the context of htat discussion.

CyrusNajmabadi commented 5 years ago

I have to believe that the same would apply when reading code. Seeing a , is a signal to the reader that there's another clause following. When it's missing, that's a cognitive speed-bump that will slow down the reader.

This does not appear to be at all a problem given that hte language already allows for this in many effective cases. Moreso, there's so little of a problem that no one has ever even asked for a way to prevent these pesky commas from being allowed in their codebase. And, working in languages that require this, and codebases that lint-require this, i've never heard a single complaint from anyone. But i have heard from many the benefit that the consistency and simplicity that things brings.

theunrepentantgeek commented 5 years ago

You are taking my quote out of context.

My apologies, @CyrusNajmabadi, I wasn't trying to misrepresent your words.

I was trying to make a more general point - that there is solid evidence from cognitive science to conclude that "optional trailing commas" impair comprehension. My apologies for doing this in a way that caused offense.

I know that some people love the pesky blighters - including your good self.

Clearly, I'm of the opposite opinion. šŸ˜

FWIW, there was - at least in the communities I frequented - quite an adverse reaction to the support for trailing commas in the C# object initialization syntax when that was released back in 2007. But that ship sailed long ago and things won't change.

CyrusNajmabadi commented 5 years ago

I was trying to make a more general point - that there is solid evidence from cognitive science to conclude that "optional trailing commas" impair comprehension.

There are many things wrong with this argument.

  1. "When reading normal English prose (*)"

This isn't english.

  1. "Proper punctuation helps the brain process the text and parse its meaning"

There is nothing improper about this punctuation. This would literally be in the definition of the language's grammar and would be 'proper'

--

English is an incredibly complex and ambiguous natural language. I don't see any reason why any sort of rules that held for it (or other complex human languages) would be though to apply to a programming language with a very strict grammar.

Furthermore, your argument is on how mispellings slow things down. But that's not what's being talked about here. For example, there's no arugment being made to just allow randomly spelled words to be allowed. Or to allow random punctuation to be allowed in random places. Such a proposal might be something that would relate to the studies you've pointed to. However, this issue is nothin akin to that. Instead, it's about allowing a very specific piece of punctuation in a very unambiguous place that would not cause any semantic confusion.

stuartstein777 commented 5 years ago

Can't people who don't want unecessary trailing commas just set up style cop rules to squiggly them as errors. (I'm one of those people). THen people that want them can have them?

WHat's the issue?

jnm2 commented 5 years ago

I'm having some cognitive dissonance. The look still (irrationally?) bothers me, but the benefit for line-based source control is so practical that I'm starting to hope StyleCop Analyzers allows me to enforce trailing commas everywhere they are allowed in the language so that I don't forget a trailing comma before it's too late.

CyrusNajmabadi commented 5 years ago

StyleCop Analyzers allows me to enforce trailing commas everywhere they are allowed in the language so that I don't forget a trailing comma before it's too late.

This is how our linters work on the projects i work on. I very much like it.

yaakov-h commented 5 years ago

I'm starting to hope StyleCop Analyzers allows me to enforce trailing commas everywhere they are allowed in the language so that I don't forget a trailing comma before it's too late.

I believe it already does. šŸ˜„

DavidArno commented 5 years ago

Can't people who don't want unecessary trailing commas just set up style cop rules to squiggly them as errors. (I'm one of those people). THen people that want them can have them?

WHat's the issue?

The issue is that people seem to fall into one of two camps on this topic:

  1. Readability is king. Trailing commas harm readability so they shouldn't be supported,
  2. Aspects of various tools have shortcomings and trailing commas let us work around those shortcomings.

Sure, I could (and in fact do) set R# to report trailing commas as an error to ensure I never accidently leave one laying around. And I'll gladly incur trivial amounts of extra work around copy/paste and git diffs to keep my code clean and tidy. But this simply reinforces my expectations around code. It means that when I read other code where the author didn't just leave them laying around accidently, they actively add them, I stumble as the code doesn't read correctly. That's the issue.

Trailing commas make code less readable. They look messy and make the author look sloppy in my view. So as a fan of readable code, I oppose their use, even by others, as I may have to read their code one day.

jnm2 commented 5 years ago

Looking sloppy is not the same thing as being less readable. You called the amount of work to edit the commas trivial, but I can't understand how the impact on reading is not even more trivial.

Being bothered by something doesn't have to slow you down at all. If it does slow you down, I'm sure you can overcome that even more quickly than it'll take to absorb the new switch syntax (for example).

qrli commented 5 years ago

I think we all are used to ; at end of each line, while it looks sloppy as English. Pascal gets that right by putting a . at end of program, but nothing gained. Why is , so different?

CyrusNajmabadi commented 5 years ago

The issue is that people seem to fall into one of two camps on this topic: Readability is king. Trailing commas harm readability so they shouldn't be supported,

That doesn't even make sense to me. Trailing commas simply makes each item consistent. The list is simple item <comma> some number of times. It's more readable to me to have each entry look similar to every other one, versus having the very last one be different.

CyrusNajmabadi commented 5 years ago

Trailing commas make code less readable. They look messy and make the author look sloppy in my view. So as a fan of readable code, I oppose their use, even by others, as I may have to read their code one day.

"Non trailing commas make code less readable. They look messy and make the author look sloppy in my view. So, as a fan of readable code, I mandate their use, even by others, as I may have to read their code one day."

--

As mentioned above, the above isn't even untrue (though i wrote it to point out the inherent personally-opinionated nature of your post). On my team our linting rules mandate this precisely so that code doesn't look 'sloppy' by virtue of the inconsistency introduced by their absence. The major difference here is that i recognize that this is stylistic preference and so each form can be desired sensible by different groups. On the other hand, you seem to be insisting that only your stylistic preference can be accepted as the 'one true way', even if that's inconsistent with the rest of the language.

Korporal commented 5 years ago

If one cannot assume that a comma is always a separator but might also be a terminator depending on the context then IMHO that clutters the language and makes it harder to internalize the syntax. There is no right or wrong, the rules of the grammar are (so long as they're self consistent) up to the designers but supporting all of this mixed up and inconsistent rules for such basic tokens can only make the language harder to use in the end.

I also cannot think of many things that are more trifling than this, and I've been told here by experts that a feature must offer significant utility, this clearly does no such thing.

The only possible defence for trailing commas is making it easier to write code that generates code, but any developer worth their salt should be able to generate a comma separated token streams that don't generate a trailing comma, if one cannot do that then modifying the language to cater for it strikes me as a backward step (but alas we already have trailing comma support in some areas, I personally wish we did not).

YairHalberstadt commented 5 years ago

@Korporal The comma is not a terminator. It is a separator. In some cases we allow a trailing seperator, and in some we don't, but they must always be followed by a terminator.

This isn't adding a new feature to the language. It's slightly relaxing the grammar of the language, and is far smaller a change to the compiler than plenty of the changes that happen on a weekly basis. As such the bar for entry is a little lower than new features.

CyrusNajmabadi commented 5 years ago

I also cannot think of many things that are more trifling than this, and I've been told here by experts that a feature must offer significant utility, this clearly does no such thing.

That's not what experts here have said. The position is that a feature must be worth it's cost. The cost here is a pittance given that the feature is already championed. The benefits are well worth it nd have been discussed at length in the thread.

The only possible defence for trailing commas is making it easier to write code that generates code

That's not true at all.

but supporting all of this mixed up and inconsistent rules

Supporting this makes things more consistent across the language.

can only make the language harder to use in the end.

Woudln't the language be harder to use the less consistent it is?

Neme12 commented 5 years ago

If I were to design a language, there would be no separators at all. Having separators instead of terminators only creates problems. It makes the last line inconsistent and it makes editing harder. Just try using Pascal where the semicolon is a separator as opposed to a terminator.

Not only that, but separators also often introduce ambiguities. Why? With terminators, if you have N items, you have N terminators. With separators you have N - 1 separators unless you have 0 items, in which case you have 0 separators too (it's not a 1 to 1 mapping). If you leave out the items themselves, it's impossible to tell whether there would have been 0 or 1 item in that place. Also, separators are the reason that we cannot have 1-element tuples.

If you're arguing that , is always a separator in English and therefore it makes code less readable (even though code isn't english), what about semicolons? They're separators in English too, yet we've all gotten used to using them as terminators in programming languages because that's just so much more convenient.

YairHalberstadt commented 5 years ago

@Neme12 I'm not sure I agree.

By using seperators you increase the information available to the compiler. Without seperators error detection is impossible as all inputs are valid. With seperators, leaving out something can be detected, which is useful to the programmer.

It's an old problem in information theory. The more dense your data, the harder it is to detect and recover from errors.

DavidArno commented 5 years ago

If you're arguing that , is always a separator in English and therefore it makes code less readable (even though code isn't english), what about semicolons?

Semicolons on the last statement are mandatory. Commas on the last item of a list are sometimes optional and sometimes a syntax error. In the first case, one just gets used to them. They are annoying, but they are consistent.

If commas were mandatory after the last item, then the same would apply to them as to semicolons. They aren't though, so the two are orthogonal.

Neme12 commented 5 years ago

@YairHalberstadt Can you please clarify that with an example? What do you mean by "all inputs are valid"?

gafter commented 5 years ago

@Neme12

If I were to design a language, there would be no separators at all. Having separators instead of terminators only creates problems.

OK, so what would a method/function call look like? Do you have no punctuation at all, or a trailing comma in the argument list?

Neme12 commented 5 years ago

Either trailing commas (which I admit would take some getting used to), or there would only be semicolons instead of commas in the language.

YairHalberstadt commented 5 years ago

@neme12

I'm struggling to find a really decent example, so for now I'll retract until a come across one.

CyrusNajmabadi commented 5 years ago

If I were to design a language, there would be no separators at all. Having separators instead of terminators only creates problems. It makes the last line inconsistent and it makes editing harder. Just try using Pascal where the semicolon is a separator as opposed to a terminator.

Every language designer eventually designs there way to the one true language.

Korporal commented 5 years ago

If I were to design a language, there would be no separators at all. Having separators instead of terminators only creates problems. It makes the last line inconsistent and it makes editing harder. Just try using Pascal where the semicolon is a separator as opposed to a terminator.

Not only that, but separators also often introduce ambiguities. Why? With terminators, if you have N items, you have N terminators. With separators you have N - 1 separators unless you have 0 items, in which case you have 0 separators too (it's not a 1 to 1 mapping). If you leave out the items themselves, it's impossible to tell whether there would have been 0 or 1 item in that place. Also, separators are the reason that we cannot have 1-element tuples.

If you're arguing that , is always a separator in English and therefore it makes code less readable (even though code isn't english), what about semicolons? They're separators in English too, yet we've all gotten used to using them as terminators in programming languages because that's just so much more convenient.

I don't see how you can even define a grammar if there is no way to distinguish tokens from on another, even a space is a separator.

F# (and most other functional languages) separates arguments in method calls (mostly) by simply spaces but it does need some way to do that.

YairHalberstadt commented 5 years ago

A space is not a seperator from the parsers perspective.

The tokenizer strips out all spaces when creating tokens.

Then the parser treats some of those tokens (commas) as seperators.

Korporal commented 5 years ago

Why not alter the language to treat a trailing comma as legal here but issue a warning? Then the user can - if they so choose - either tolerate and ignore the warning or not?

YairHalberstadt commented 5 years ago

Because warnings are only issued for behaviour that can't be made illegal for whatever reason but could potentially be dangerous

Korporal commented 5 years ago

@YairHalberstadt

A space is not a seperator from the parsers perspective.

True a space is not a token as a comma is I agree but is still used to separate tokens as in here:

public class some_identifer_1234

The tokenizer strips out all spaces when creating tokens.

Yes but it's the presence of the spaces that enables it to do that, the lexical analysis "regards" the space as a separator that's all I'm saying here.

Then the parser treats some of those tokens (commas) as seperators.

Neme12 commented 5 years ago

even a space is a separator.

You can (and usually do) have trailing space after the last item. If you couldn't, I'd definitely have a problem with that. By separator I meant anything that disallows having a trailing separator. Anyway, my comment was out of topic. This issue is about C#.