dotnet / csharplang

The official repo for the design of the C# programming language
11.54k stars 1.03k forks source link

Champion "deprecate _ (underbar) as an identifier" #1064

Closed gafter closed 6 years ago

gafter commented 7 years ago

It is unfortunate that the reader cannot easily distinguish at the use site between a discard and a use of an identifier named _. Presumably the latter is rare. Programs would be easier to read if one could reliably depend on _ being a discard.

jnm2 commented 7 years ago
Edit: I've changed my position on this: https://github.com/dotnet/csharplang/issues/4460#issuecomment-787490711

I use .Select(_ => _.Foo) everywhere, and I want to continue doing so.

HaloFour commented 7 years ago

Relevant previous discussions/arguments/rants:

https://github.com/dotnet/roslyn/issues/14794 https://github.com/dotnet/roslyn/issues/14862

There might be a few more.

I think it's worth the discussion although it would probably be difficult to resolve without breaking a lot of existing code, particular like @jnm's example.

🍝 Perhaps the rules for identifiers could allow underbar for a lambda parameter in expression lambdas but not elsewhere? That would be weird.

At the very least perhaps consider adding a warning wave to declaring a field or local using the underbar identifier. You could differentiate the discard/local case by whether or not the local is ever used.

bondsbw commented 7 years ago

If there was an auto-upgrade conversion of existing usages of _ as an identifier to @_, it might not break so much existing code.

jnm2 commented 7 years ago

The point of _ => _.Foo is that _ is the least possible visual clutter. I'm not a fan of @_ => @_.Foo.

bondsbw commented 7 years ago

@jnm2 Agreed, it would just keep from breaking existing code.

iam3yal commented 7 years ago

@jnm2 Well, you can always use something like x, however, I understand why you would dislike it.

p.s. iirc @HaloFour made a proposal that addresses exactly this or something similar where we can do something like @.Foo or .Foo (can't remember) instead of _ => _.Foo.

jnm2 commented 7 years ago

I'd stop using _ => _.Foo in a heartbeat if I could use @.Foo.

iam3yal commented 7 years ago

@jnm2 Indeed, it would be awesome if they would at one hand get rid of _ as a valid identifier and on another hand would allow people to do something like @.Foo.

alrz commented 7 years ago

I'd support any kind of deprecation at this point. This seemed impossible to me but now I can see the light. Please deprecate more.

MgSam commented 7 years ago

Is this seriously being considered or is this just a wish? Has the LDC suddenly had a change of heart where they're willing to take breaking changes?

yaakov-h commented 7 years ago

What does "deprecate" mean in this context? I assume it wouldn't be a full removal from the language?

iam3yal commented 7 years ago

@yaakov-h I assume that in any place where today you can use a single _ as a valid identifier it wouldn't be possible with this change.

Joe4evr commented 7 years ago

At the very least perhaps consider adding a warning wave to declaring a field or local using the underbar identifier.

I recall that something like this was already planned to be the case. And the whole process would likely be spread out over a few major language versions, regardless. (So for example, 8.0: Info, 9.0: Warning (maybe with exclusion from /warningsAsErrors), 10.0: Error. But that's just spit-balling.)

kentcb commented 7 years ago

Stumbled on this by accident and just had to add my 2c. I'd be very happy if _ was treated as discard consistently in C#. We're in a weird place right now where it's sometimes discard, sometimes not. I think that was a mistake, but one better fixed late than never.

As someone who makes heavy use of Rx, being able to discard with _ in lambdas would be πŸ’―. Random example of one place this would be useful:

this.WhenAnyValue(
    x => x.DepartureLocation,
    x => x.ArrivalLocation,
    x => x.JourneyMode,
    x => x.JourneyTime,
    x => x.ShowAll,
    (_, __, ___, ____, _____) => Unit.Default)

Having said that, I understand concern from @jnm2 et al regarding breaking changes. Personally, I think _ (and x, and @) are all terrible names for lambda parameters when a more descriptive name adds value (not the case above because the x there is just a stand-in for this). For example, this is much better:

items.Select(item => ...)

But of course, now we're getting into style arguments...

jnm2 commented 7 years ago

There's no reason we couldn't have this, and there's even a proposal on it:

this.WhenAnyValue(
    _ => _.DepartureLocation,
    _ => _.ArrivalLocation,
    _ => _.JourneyMode,
    _ => _.JourneyTime,
    _ => _.ShowAll,
    (_, _, _, _, _) => Unit.Default)

items.Select(item => item.Foo) does not seem more descriptive than items.Select(_ => _.Foo).

kentcb commented 7 years ago

There's no reason we couldn't have this, and there's even a proposal on it

Er, yes. It's what I'm asking for. But it would require that this proposal be accepted so that _ is not an identifier. Or maybe I'm missing something...?

items.Select(item => item.Foo) does not seem more descriptive than items.Select( => .Foo)

It was a very simple example based on yours. Add a SelectMany and a Where, all the while calling all variables (of differing) types _, then see how easy it is to read :)

jnm2 commented 7 years ago

There's no reason we couldn't have this

Er, yes. It's what I'm asking for.

The code sample uses _ for both parameter names and discard– it didn't sound like you were asking for that.

Add a SelectMany and a Where, all the while calling all variables (of differing) types _, then see how easy it is to read :)

Though in fact it's most predominantly used only once in a chain, I do this on a regular basis and like it:

foos.SelectMany(_ => _.Bars).Any(_ => _.Start != null || _.End != null)

The project I have open: 16.9% of the files use _., and in those files it occurs 6.59 times per file.

kentcb commented 7 years ago

The code sample uses _ for both parameter names and discard

Ah, I overlooked your use of _ for parameter names. That is certainly not what I'm asking for. Thanks for clarifying.

DarthVella commented 7 years ago

@eyalsk I'd assumed a compiler warning for a while, up until it is outright disallowed on the next major language update (or whenever).

MkazemAkhgary commented 7 years ago

Attention please

Why not make it an option to enable/disable usage of discards all over the place?

like how unsafe is optional. that way it wont be breaking change.

DavidArno commented 7 years ago

Whilst this is a breaking change, it is only breaking if one uses "treat warnings as errors" (which hopefully we all do; oh to have that as the default in new projects!) and it need only be breaking for as long as it takes to select a "ignore this warning in this project" code fix.

iam3yal commented 7 years ago

@DarthVella Yeah, probably. :)

DavidArno commented 7 years ago

@DarthVella, @eyalsk,

Warning in v8, error in v9 seems a nice timeline for phasing out the use of _ as an identifier. I'd certainly not like to see either introduced in a minor release.

iam3yal commented 7 years ago

@DavidArno Yeah this is what I wrote in a comment to @MkazemAkhgary in his post. ;)

MkazemAkhgary commented 7 years ago

for a very rare case, __ (two _) wouldn't be a warning. unless they decide to deprecate underbar-only identifiers.

(_, __) // former becomes discard, latter remain variable.
DavidArno commented 7 years ago

@MkazemAkhgary,

I'd personally like to see all underscore-only identifiers deprecated, even though I know this would break some of my own code (and thus the code of anyone who uses my library). The benefits of clarity would outweigh the small pain of fixing that broken code in my view.

sharwell commented 7 years ago

@gafter My previous understanding was it's possible to automatically turn all _ identifiers in a scope into discards when there are two or more of them (a situation which previously resulted in a compiler error). This is a both a non-breaking change and delivers the "discards everywhere" feature. To me, the approach seems like an obvious choice; is there a reason why it didn't work in practice?

:memo: With this behavior in place, an analyzer could be used to deprecate _ as an identifier in code bases - specifically to catch cases where a single _ is not being used as a discard.

:memo: I have some code bases which this change would break. It wasn't commonly used, but I'm loathe to change the affected cases because they are in some of the more stable parts of code I've worked on in the past (approaching zero churn).

jnm2 commented 7 years ago

If we get .Select(@.Foo) first, then this is okay. It's kind of harsh to deprecate a feature that is only used because we don't have anything better.

@sharwell's suggestion makes the most sense of any in this thread. Frankly I'm a little surprised at some of the supporters here– normally you'd be the first to say to allegations that a language feature is dirty, "If you don't like it, write an analyzer." πŸ˜ƒ

Vlad-Stryapko commented 7 years ago

Totally agree with @jnm2 and @sharwell on the matter.

You either don't need a complex identifier, as in items.Where or you do need it and in that case x won't be a good name.

I do like the way Scala solves it. Lambdas can be written as items.Where(_.Property == property) and lambdas with _ => _ aren't used since _ is a wildcard. So, IMHO, having a special syntax for short lambdas would be extremely nice and will play with this proposal very well.

Otherwise, it will break the codebase in both pet and work projects I participate in.

@DavidArno

The benefits of clarity would outweigh the small pain of fixing that broken code in my view.

There is absolutely no benefit of clarity if we replace _ with x, y, z and so on. I've seen such lambdas and they're awful regardless of the symbol used. On the other hand, for simple cases _ isn't worse than x.

HaloFour commented 7 years ago

@Vlad-Stryapko @jnm2

Theoretically if C# takes the hardline approach and actually makes _ an illegal identifier they could reuse _ for lambda shorthand in a similar manner to Scala:

var foos = stuff.Select(_.Foo)
    .Where(_.Property == property);
Vlad-Stryapko commented 7 years ago

@HaloFour Yes, that's something I'd really love to have in C#.

As far as I know, current proposal for shorthand in this repo is a bit different from the way Scala works with them, but I personally find Scala's choice not ideal so we definitely might have something even better in C#.

bondsbw commented 7 years ago

@HaloFour Do you know of any reason that couldn't be done today, allowed everywhere it is unambiguous?

So even if this proposal results in just a warning wave for now, we could still benefit from lambda shorthand using _ at the same time.

HaloFour commented 7 years ago

@Vlad-Stryapko

I'm less familiar with how _ works in Scala, but after a brief read I can immediately see some behaviors that would be scary. For example, what do you think the following would do?

int[] numbers = GetSomeInt32Array();
int[] doubled = numbers.Select(_+_).ToArray();

The proposal on this repo is a bit limited in scope, somewhat intentionally. I modeled it more as a shorthand to instance method references from Java, where the type is inferred and it's only really the member that matters. I can see there possibly being some value in supporting more complex expressions but, in my opinion, the goal is primarily to shine focus on the member that is relevant to the transform, not simply provide another lambda syntax that avoids naming things. I don't care for the positional shorthand argument syntax in Swift, for example. It just seems sloppy to me. You're only going to have to write that code once but you're going to have to read and reason about it many times so it seems reasonable to require that the developer devise names to describe the relationship of those arguments.

@bondsbw

Do you know of any reason that couldn't be done today, allowed everywhere it is unambiguous?

Even if it could (and I imagine that it is probably possible), I would dislike the overloading of _ with yet another special meaning where the syntax has overlap with legal syntax today. If _ remained a legal identifier my examples above are immediately ambiguous to the developer who then has to determine if there is a variable named _ in scope.

Vlad-Stryapko commented 7 years ago

@HaloFour

For example, what do you think the following would do?

If we had it in C#, I would expect that to mean Select(number => number + number). So basically we can refer to the function's argument as many times as we want but we're limited to functions with only one parameter.

With Scala's semantics, that would choose an overload of Select which takes Func<TSource, int, TResult> and treat the first _ as an element of the sequence and the second _ as its position. I don't think it's obvious for the majority of C# developers so I'd prefer the first version with number => number + number.

svick commented 7 years ago

Presumably [using an identifier named _] is rare.

I thought it would be good to validate that assumption and since I performed a similar survey before (https://github.com/dotnet/roslyn/issues/14794#issuecomment-258961093), I dusted off my old code and ran the analysis. I looked at 875 recently updated C# GitHub repos with less than 100 stars (to exclude repos like Roslyn) to see how they use underscore as an identifier. The results:

For the last point, a "use" means a use of a distinct symbol in a file. The 28 other uses were:

jnm2 commented 7 years ago

Not to mention that when I'm in pre-C# 7 codebases, I've been using var _ = to discard values 😳

jnm2 commented 7 years ago

@svick I love your approach. If I may ask, how long would it take to scale that up to cover another order of magnitude or more repos?

DavidArno commented 7 years ago

@Vlad-Stryapko,

The benefits of clarity would outweigh the small pain of fixing that broken code in my view.

There is absolutely no benefit of clarity if we replace _ with x, y, z and so on.

Whether there is benefit to replacing with _ with x etc is orthogonal to what I said. To clarify, if _ were to cease to be a valid identifier, then I feel it sensible to do the same with __. The two are already easily confused. Having one be a valid identifier and the other not, would just make that confusion worse.

Vlad-Stryapko commented 7 years ago

@DavidArno I've understood your initial statement as if making all underscores invalid would benefit the clarity of the code since people would have to replace their broken code with something more meaningful. That's the point I certainly disagree with.

if _ were to cease to be a valid identifier, then I feel it sensible to do the same with __

If we had short syntax for lambdas and _ would act as a discard, I'd personally be fine with whatever happens to __ and other similar identifiers. I think both 'single underscore has a special meaning, multiple underscores are identifiers' and 'single underscore has a special meaning, multiple underscores are not allowed' are fine.

MkazemAkhgary commented 7 years ago

@Vlad-Stryapko yet you provide no convincing reason why multiple underscores are meaningful.

Ask yourself when you choose multiple underscores for variable names

Unless you are drunk you pretty much use it after you use _ first. It makes perfecr sense to deprecate all underscore-only identifiers. That is if c# team already decided to make big change, moving a little higher isnt big deal.

Vlad-Stryapko commented 7 years ago

@MkazemAkhgary

you provide no convincing reason why multiple underscores are meaningful

They're already legal and that's why there must be a reason to make them illegal, not vice versa. The reason to make a single one illegal is to reserve it for the special purpose. I can't say I'm 100% sure multiple underscores must be disallowed so I refer to the principle 'don't create more restrictions and change anything unless it's really needed'.

MkazemAkhgary commented 7 years ago

@Vlad-Stryapko by all means, you wouldn't use __ normally unless you have _ somewhere declared. I'm pretty sure _ fans wouldn't go for extra underscore after this limitation, they probably choose x anyway.

I just want to clarify how bad this feels for me, c# team is already ruining c# language, one that I used to love. now is getting dilapidated with these weird stuff. 😞

iam3yal commented 7 years ago

@MkazemAkhgary

c# team is already ruining c# language

Please... let's have some respect and try to keep it professional? just because you disagree with their decisions doesn't mean that they are ruining the language.

one that I used to love.

I have plenty to say about these kind of statements but I'll be gentle and say it wouldn't help your cause.

now is getting dilapidated with these weird stuff.

Sometimes we need to try and see the world through a different lens and then maybe it wouldn't be so weird.

MkazemAkhgary commented 7 years ago

ok, my apologize to who are offended, I try to keep it to my self. you are right, maybe its just matter of time. @eyalsk

aluanhaddad commented 7 years ago

@eyalsk I most definitely agree with you, and I personally never write code using _, so this won't affect me, but I think everyone saw this coming and that it is an unnecessary breakage. This could have been avoided and it's not adding a lot of value vs. using a different character.

iam3yal commented 7 years ago

@aluanhaddad Yeah but the candidates were few and iirc required more changes to the language so they were pragmatic about it and given the amount of people that are using it, the amount of efforts required to make it work as an ignore character just make sense and let's be honest it looks good. πŸ˜‰

Sorry @jnm2. πŸ˜„ ❀️

kentcb commented 7 years ago

As a related aside, I've thrown together a Roslyn analyzer to detect the use of identifiers with discard names. I may have missed some cases. Feedback welcome.

yaakov-h commented 7 years ago

FWIW, StyleCop.Analyzers already disallows _ as a variable name - I had to run through my codebases and remove them when switching from StyleCop to StyleCop Analyzers.

sharwell commented 7 years ago

@yaakov-h I believe StyleCop Analyzers makes an exception for _ as a parameter name.

yaakov-h commented 7 years ago

@sharwell Ah, you're right - it was the multiple-underscore ones it didn't like, and I changed single-underscore ones at the same time. https://github.com/DotNetAnalyzers/StyleCopAnalyzers/issues/1606