carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.28k stars 1.48k forks source link

lowercase_underscore everything #1435

Closed zhihaoy closed 2 years ago

zhihaoy commented 2 years ago

Everything, including type parameters.

A 2010 study shows that lowercase_underscore significantly improves the speed of reading:

Although, no difference was found between identifier styles with respect to accuracy, results indicate a significant improvement in time and lower visual effort with the underscore style.

SICP says, "Programs must be written for people to read, and only incidentally for machines to execute." Speed of reading should be something we optimize.

In addition, the entire C++ standard library is in lowercase_underscore. Seamless, bidirectional interoperability should take coding style into consideration.

Last but not least, CamelCase had never overcome the difficulty of spelling acronyms.

zygoloid commented 2 years ago

Interesting research paper! It seems to come to somewhat different conclusions than the 2009 paper it's inspired by that finds "camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style". The two studies had quite different sets of participants -- the 2009 paper having 135 participants (half programmers, half not), and the 2010 paper having 15 programmers -- and somewhat different methodologies, so I think we'll need a deeper dive into these two papers to decide what we can learn from them.

fowles commented 2 years ago

I am a little leery of putting too much faith in a study of only 15 participants.

zhihaoy commented 2 years ago

The "15 participants" paper was a response to the "135 participants" paper, apparently, the authors have faith in their method :)

Among the 135 subjects in the 2009 paper, half received 1-4 years of CS study in one particular school (called "training" in their paper) had an increasing "preference" for camel case over underscore already. In other words, the bias is built-in. What figure. 2 really tells you is that:

  1. People who prefer camel case style are more used to reading camel case identifiers.
  2. People who receive more CS lessons are more used to reading identifiers, regardless of style.

And the 2010 paper was trying to reduce the bias and test the body reaction, which is less likely to be different considering the evolution.

zeroxs commented 2 years ago

As someone who uses various languages utilizing both styles, I much prefer working with snake_case in C++ and find it far easier to read/write even though my first language was PHP, a PascalCase language, which I continue to use to this day and have had a professional career in for over a decade. Even though PHP has always dominated my programming time, snake_case is the true joy.

mo-xiaoming commented 2 years ago

As someone who uses various languages utilizing both styles, I much prefer working with snake_case in C++ and find it far easier to read/write even though my first language was PHP, a PascalCase language, which I continue to use to this day and have had a professional career in for over a decade. Even though PHP has always dominated my programming time, snake_case is the true joy.

same story here, CamelCase for work, and snake_case for hobbies projects. CamelCase is not that friendly for me, I'm from a non English speaking background

zeroxs commented 2 years ago

JavaIsATestamentToPascalCaseGettingOutOfHand whereas_cpp_is_a_stark_contrast_and_shows_enhanced_readability 😊 (underscores can effectively be interpreted as spaces and you do not need to spend mental effort to insert spaces into a long string)

mo-xiaoming commented 2 years ago

@zeroxs spot on!

Another example is google test. It doesn't allow underscore in test names, so I have to deal something like GivenUserIsAuthenticatedWhenInvalidAccountNumberIsUsedToWithdrawMoneyThenTransactionsWillFail

zhihaoy commented 2 years ago

JavaIsATestamentToPascalCaseGettingOutOfHand

I literally stared at the "Javal" for 1 second and thought, "what this word is." Is that a kind of "Gravel?" Sure, the "I" is indistinguishable from "l" in my browser, and maybe I should inject some CSS, but I don't think I can solve the problem on colleagues' machines.

And then I stared at the "ATest" for another second and thought, "what this test is." I know there is an AP Test; what is "ATest?"

lupuchard commented 2 years ago

I think there's a significant readability benefit in having different styles for types vs non-types, like in Rust (PascalCase for types, snake_case for everything else).

forest1102 commented 2 years ago

I totally agree with you, I prefer snake_case by far.
While c++ std library uses snake_case, I can't understand why Carbon code uses PascalCase?
Like other programming languages Google makes, why are they using UpperCamelCase believed in Google?

chandlerc commented 2 years ago

New posts with individual preferences on this topic aren't really adding new information. Please help minimize the redundant content here, you can always thumbs-up something. =]

This is obviously something that different people will have different preferences around. But we need to make a decision for Carbon and can't make it perfect for everyone.

The studies are really interesting, and thanks for surfacing those. While I have some questions about how significant the difference is and the sample size, it still seems like an interesting factor to consider.

However, I think there is specific benefit to having visually distinct styles for highly different constructs. That is a significant factor in the existing decision. Moving from 2 styles to 1 style, even if the 1 style is demonstrably better, is a trade-off where we give up the distinction to get the human-preferred format in more places.

As I think that is the question, its probably best asked to the leads for a decision. They can always come back with "we would like to see analysis of the studies", etc.

Personally, my stance is that even if there is a difference, I don't think it is enough of a difference to make the tradeoff of having only a single style the right one. CamelCase is widely enough used in enough programming languages (and now even non-programming contexts like #HashTag), that it seems unlikely to be important to avoid. Similarly, having some distinctions reflected in naming conventions is quite widespread.

ville-h commented 2 years ago

A naming scheme that doesn't use a token separator sigil, such as '_', has couple downsides in my experience. Token boundaries takes some effort to distinguish for both humans and programs. Admittedly humans can often do it easy-enough if they have some level of familiarity with the project, the domain and the names which relate to the domain. For programs it's doable by special casing bunch of domain specific names.

As an example I've written a bindings generator for Vulkan based on their xml specification. Vulkan for various parts doesn't use a token separator sigil: a) vkCreateDevice b) vkCreateMacOSSurfaceMVK c) VkExtent3D and d) VkD3D12FenceSubmitInfoKHR. Generating bindings for an environment where snake_case is the norm is now somewhat more involved. The a) is easy enough, but b), c) and d) are not quite as easy. The rest you would probably want as: b) vk_create_macos_surface_mkv, c) vk_extent_3d and d) vk_d3d12_fence_submit_info_khr

Another example where even humans will have problems is when the names are formed in such way that the capitalization does not represent token boundaries. Rather the naming scheme used maintains the all-caps nature of acronyms: common gateway interface => CGI, hyper text markup language => HTML and then forming: CGIHTMLPrinter() rather than CgiHtmlPrinter(). If you plan to go ahead with not using a token separator sigil then I suggest you also weigh in on this matter in your naming rules. Hopefully choosing the latter form.

doganulus commented 2 years ago

Carbon already differentiates different constructs using var, fn, class, and package. With everything being a value, making visually distinct styles for highly different constructs would be redundant.

And CamelCase reminds me of the Java bloat. Yes, the bloat starts from styles.

OlaFosheimGrostad commented 2 years ago

When modelling I prefer that type names ThatCarryWeight has to do with the domain I am modelling and things_that_have_a_supporting_technical_role are in lower case.

E.g: User, Budget, AnnualBudget, MonthlyBudget vs array, string, map…

I don't want to be forced to add verbosity such as Model.User, Model.Budget, Model.AnnualBudget and so on to make the code clear. I'd rather fork the language… :-)

As far as I am concerned, C++ got this right, and even if they didn't; C++ programmers are already conditioned to this preference. You need to consider the multitude of practices in the culture you are appealing to as practitioners have already adapted their practices to the look-and-feel of C++ over decades, and they are predictably reluctant to change practices for no good reason. (In this context, what other languages than C++ do is not particularly relevant, other than «fashion» or «opinion», for which there are no objective arguments)

If uptake is a goal then staying close to C++ where the usability is not significantly improved by being different from it should be a priority. Unless, of course, the goal is to appeal primarily to devs who cannot stand C++. If so, maybe make that a stated goal?

Becoming a successor to C++ is quite different from being an alternative to C++…

nigeltao commented 2 years ago

I think there is specific benefit to having visually distinct styles for highly different constructs

If you want 2 styles but you also like underscores and you like consistency, you could possibly do:

Class_names_start_with_upper
variable_names_start_with_lower
IS_THERE_A_THIRD_CATEGORY_MAYBE_MACROS_OR_ENUMS

Just an idea. Not necessarily a good one.

chandlerc commented 2 years ago

At the moment, there doesn't appear to be much in the way of compelling technical argument about exactly which style to use.

The leads (excluding me) are deferring this to the painter (me), and I think the current paint is fine -- we'll not switch to lowercase_underscore at this time.

This doesn't preclude revisiting this if/when we can gather really good data or other new information comes up.

helmesjo commented 2 years ago

If there is no compelling technical argument for either style, wouldn't it make sense to not change what has already been established, and start from there? Ignoring the fact that this_objectively_is_easier_to_read compared ToThisOverHere (also the bloat by the latter when you inevitably mix Carbon w C++). Just feels like a (unnecessary) certainty that this will receive quite some pushback being "the successor of C++".

doganulus commented 2 years ago

@chandlerc Below there are a few questions for the Carbon team.

1) Could you explain why Carbon breaks the snake_case convention favored by std, boost, abseil, and other standard libraries including Python's?

2) The snake_case convention helps make the standard constructs less flashy to eyes and focus on the application-specific constructs (mentioned in a few comments above) where you can use your own style. Could you show a general complaint against snake_case in the language and library use from the C++ community?

3) Does Carbon have an implicit goal to maintain consistency with Google's codebase? Google has made an exception to its style guidelines for the abseil library to maintain consistency with the C++ community. Now, if you try to enforce the Google style using Carbon, don't you see this would be a hostile action if there is no complaint on the style? I think this is important.

4) Apparently you state here that human interoperability and migration (probably you address C++ developers here) is important for Carbon. Then again why do you try to change snake_case convention of C++ standards, the cornerstone of C++ language.

So, it's the Carbon team that must explain the decision to abandon the well-established style of C++.

christianparpart commented 2 years ago

The leads (excluding me) are deferring this to the painter (me), and I think the current paint is fine -- we'll not switch to lowercase_underscore at this time.

I "think" you missed to give a reason :) No offense, i'm kinda blind, I may have missed it, but I cannot find a technical reasoning on why CamelCase is favored against all the odds described above and below. :cry:

  1. Does Carbon have an implicit goal to maintain consistency with Google's codebase?

@doganulus that was my very first thought when seeing Carbon language examples. I am in fact glad I wasn't the only one having a not too positive feeling about it. :)

KateGregory commented 2 years ago

I "think" you missed to give a reason :) No offense, i'm kinda blind, I may have missed it, but I cannot find a technical reasoning on why CamelCase is favored against all the odds described above and below. 😢

The whole point about "leaving it to the painter" is that there doesn't really need to be a reason. There are a pile of options, all basically as good as each other; choosing among them is personal preference or a flip-a-coin. If there was a powerful argument that would convince everyone about a particular choice, then that would be chosen. But you need to make a choice even when all the options are pretty much the same, with some people preferring each option. When we agreed to defer to the painter that means we agreed there really is no strong argument for or against the various casing choices being considered.

In this particular case, a choice has already been made, and now to change it would take work (updating documentation, redoing examples, etc) and there just isn't a compelling reason to take on that effort. As you've said, there isn't much technical reasoning one way or the other.

zeroxs commented 2 years ago

The whole point about "leaving it to the painter" is that there doesn't really need to be a reason. There are a pile of options, all basically as good as each other; choosing among them is personal preference or a flip-a-coin.

Changing something for the sake of changing something is not good practice. Especially if you're trying to appeal to the former crowd. Therefore, a reason would be expected for changing the style from what has been established in C++ since this is supposed to be C++'s "successor".

helmesjo commented 2 years ago

I "think" you missed to give a reason :) No offense, i'm kinda blind, I may have missed it, but I cannot find a technical reasoning on why CamelCase is favored against all the odds described above and below. 😢

The whole point about "leaving it to the painter" is that there doesn't really need to be a reason. There are a pile of options, all basically as good as each other; choosing among them is personal preference or a flip-a-coin. If there was a powerful argument that would convince everyone about a particular choice, then that would be chosen. But you need to make a choice even when all the options are pretty much the same, with some people preferring each option. When we agreed to defer to the painter that means we agreed there really is no strong argument for or against the various casing choices being considered.

In this particular case, a choice has already been made, and now to change it would take work (updating documentation, redoing examples, etc) and there just isn't a compelling reason to take on that effort. As you've said, there isn't much technical reasoning one way or the other.

Well I guess it's here that I (and evidently the majority of the people participating in this thread) feel like your dodging the obvious: If there is no strong "technical argument" for either, snake_case still wins, because it is what's already established (which in itself is quite a huge argument)... This whole discussion starts to get a kinda bad smell around it to be honest, and unfortunately the Google conspiracy is the prime suspect (which feels like a bad start given what was said at the talk).

Edit: Regarding the work to change code & documentation, if that can't be done with a couple of regex, I'll be pretty stumped. :)

KateGregory commented 2 years ago

If there is no strong "technical argument" for either, snake_case still wins, because it is what's already established (which in itself is quite a huge argument)...

There is no "always use snake case" rule in C++. I've been using C++ since the mid 80s (and have never worked for Google) and have used a wide variety of casing styles in that almost-four-decades. I've worked places where underscores were banned anywhere in a name, for example. During the time before Carbon was public, some decisions were made which are just not a very big deal. This is one of them. We wanted different styles for different constructs. Most naming guides I've used in C++ have had a similar rule (even if it was as minimal as doing macros in all-caps.) Some people here are arguing for a single style for everything saying that it's "better". The leads decided that there's no compelling reason to change what has already been done for Carbon. It's not a case of "change it now because the work to change it later will be much much more."

People feel very strongly about names and styles. That doesn't make this a big deal. We can change it later for the same effort as changing it now, and we have, to be frank, bigger fish to fry. As the rest of the design settles into place, we may want more (or less) contrast with built-in type names (i32) introducers (fn, let, var), as well as possible keywords. Many of those are the subject of other issues, discussions, and PRs. Would your opinion on naming case change if the introducers changed? If lambdas had a keyword introducer vs a punctuation introducer? Some people's would. So the lead decision is that for now, the naming case in our examples and documentation is up to the painter, who has decided that it isn't going to change. This can be revisited.

zeroxs commented 2 years ago

A conscious decision was made to go against the established style of C++'s stdlib and other widely used and impactful libraries and people are asking what the reasoning was. "Just because" leaves a bad taste in the mouth.

helmesjo commented 2 years ago

If there is no strong "technical argument" for either, snake_case still wins, because it is what's already established (which in itself is quite a huge argument)...

There is no "always use snake case" rule in C++. I've been using C++ since the mid 80s (and have never worked for Google) and have used a wide variety of casing styles in that almost-four-decades. I've worked places where underscores were banned anywhere in a name, for example. During the time before Carbon was public, some decisions were made which are just not a very big deal. This is one of them. We wanted different styles for different constructs. Most naming guides I've used in C++ have had a similar rule (even if it was as minimal as doing macros in all-caps.) Some people here are arguing for a single style for everything saying that it's "better". The leads decided that there's no compelling reason to change what has already been done for Carbon. It's not a case of "change it now because the work to change it later will be much much more."

People feel very strongly about names and styles. That doesn't make this a big deal. We can change it later for the same effort as changing it now, and we have, to be frank, bigger fish to fry. As the rest of the design settles into place, we may want more (or less) contrast with built-in type names (i32) introducers (fn, let, var), as well as possible keywords. Many of those are the subject of other issues, discussions, and PRs. Would your opinion on naming case change if the introducers changed? If lambdas had a keyword introducer vs a punctuation introducer? Some people's would. So the lead decision is that for now, the naming case in our examples and documentation is up to the painter, who has decided that it isn't going to change. This can be revisited.

Thanks for the lengthy reply, but unfortunately I think you read me wrong: I'm not talking about personal preference at all, I'm talking about the gigantic standard library bundled with C++ and the obvious benefit of not mixing styles (hence all the style guides that exists in each company and sometimes even teams). The fact that I personally happen to prefer it for my own code is beside the point.

Either way, I'm just stating what I'm sensing from this discussion and the summary is to "put the lid on it". Given that attitude I'm just certain it will come up over and over again.

geoffromer commented 2 years ago

As a friendly reminder, it's OK to disagree with people, but it's not OK to claim they have hidden motivations. See the code of conduct, especially the section on "When we disagree, try to understand why".

zygoloid commented 2 years ago

snake_case is an established convention for the C++ standard library, but I don't find that to be especially important, given that we intend for Carbon code to eventually have little reliance on and use of the C++ standard library. A more important concern is what conventions are used in C++ codebases, where there is a lot of variability, but a great many popular mainstream coding conventions, including the Qt/KDE convention, the LLVM convention, the most common Microsoft convention, the Mozilla convention and the Google convention all use UpperCamelCase for class names, and JSF uses Upper_snake_case. As such, it does not appear to be the case that lower_snake_case is the default or predominant convention for the C++ ecosystem and for C++ codebases in general. (Note: I'm not saying that some other convention is the default convention, but rather that the C++ ecosystem is large and varied and there is no universally-best answer here.)

Given the variance observed in practice and the lack of solid evidence that one style is better than another, we don't have a clear objective rationale to pick one convention over another in order to best help Carbon achieve its goals, and I don't believe this issue has provided us with one. In the Carbon project, we want questions that are purely matters of taste to be decided consistently, and not by community consensus, in order to give the language a consistent look and feel. That's why we have the painter role, to which this question should be, and has been, delegated.

doganulus commented 2 years ago

I think you have failed to see that it is the unobtrusive style of the C++ standard that allows catering to the different tastes of the vast C++ community.

A language to replace C++ must be like water...

Don’t get set into one form, adapt it and build your own, and let it grow, be like water. Empty your mind, be formless, shapeless — like water. Now you put water in a cup, it becomes the cup; You put water into a bottle it becomes the bottle; You put it in a teapot it becomes the teapot. – Bruce Lee

The standard library has done it for years. You are just calling trouble by changing it unnecessarily. We are not talking about the style of company projects, we are talking about the style of the language. And it's snake_case.