eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
940 stars 394 forks source link

OMR coding format #350

Open charliegracie opened 7 years ago

charliegracie commented 7 years ago

Currently the OMR project has a very detailed coding standard which originally came from the IBM J9 coding standard. These formatting guidelines and best practices were evolved over many years of development to help onboard new developers and ensure a consistent look and feel for all of the code.

As new guidelines were added a lot of the old code was left unchanged to make it easier to associate lines of code with defects or designs. This means that the consistent look and feel of the code is not very consistent.

The code formatting of the current coding standard is difficult to completely automate so a lot of time is spent correcting format violations during code reviews. I believe this time would be better spent looking for actual code issues so moving towards an automated code formatter would be a benefit to the project.

When the compiler component was released it had been developed using a different coding standard so we currently have 2 very distinct coding styles being used in the project. A commit may require changes across many of the Eclipse OMR components so a consistent code format would make this much easier.

By open sourcing the code into the Eclipse OMR project there is very little history so this may be a good time to format the entire code base and come up with a way to ensure that the codebase always stays correctly formatted.

There has been conversation on the omr-dev mailing list discussing the idea of using clang-format to format the Eclipse OMR project. There was a general consensus on the mailing list that this was the right direction so I am opening the Issue so we can finalize the discussion.

On the mailing list the webkit format received the most +1 so that is the current direction I will be focusing on.

Any / all comments will be greatly appreciated.

charliegracie commented 7 years ago

An example of what the code looks like formatted via the webkit format can be viewed here: https://github.com/mgaudet/omr/tree/clang-format-webkit-all

Thanks for putting that together @mgaudet

fjeremic commented 7 years ago

Referencing this file:

https://github.com/mgaudet/omr/blob/clang-format-webkit-all/thread/osx/priority.c

Interesting that multi-line comments get indented with tabs starting with the second line. They end up looking misaligned:

https://github.com/mgaudet/omr/blob/clang-format-webkit-all/thread/osx/priority.c#L139-L141

It is also interesting that the formatter chooses to place function return types on a new line, except for when it is a void function. Here is a comparison of a static void vs void declaration:

https://github.com/mgaudet/omr/blob/clang-format-webkit-all/thread/osx/priority.c#L280-L281 https://github.com/mgaudet/omr/blob/clang-format-webkit-all/thread/osx/priority.c#L311-L312

It seems // stlye multi-line comments get formatted correctly:

https://github.com/mgaudet/omr/blob/clang-format-webkit-all/compiler/z/codegen/BinaryCommutativeAnalyser.cpp#L83-L96

rwy7 commented 7 years ago

completely spitballing here but, I'm going to try and guess why the reflow failed.

The tab width is 8, the indent offset is 4. Tabs are converted to spaces to fix indentation. Any remaining tabs are considered the comment body.

Prediction: setting the tab width to 4 will fix the reflow.

Edit: nope, completely wrong :p

fjeremic commented 7 years ago

I think you are at least partially correct. Looking at the original file:

https://github.com/eclipse/omr/blob/master/thread/osx/priority.c#L261-L265

The comment line is actually a mix of tabs and spaces. It seems clang-format does not touch these at all. Webkit uses spaces but for whatever reason clang-format does not modify the spacing inside of any comment and since this is a multi-line comment it doesn't touch it. We would manually have to do a post-process and replace all \t with the equivalent number of spaces that the style is using (default 4 I think).

rwy7 commented 7 years ago

ColumnLimit: 0 seems to be causing all of the weirdness pointed out by @fjeremic.

Edit: Maybe we're hitting a strange case of

A column limit of 0 means that there is no column limit. In this case, clang-format will respect the input’s line breaking decisions within statements unless they contradict other rules.

mgaudet commented 7 years ago

Here's a branch with ColumnLimit: 120 (totally arbitrary number I chose): https://github.com/mgaudet/omr/tree/clang-format-webkit-column-limit/

Here's the aforementioned misformatted comment: https://github.com/mgaudet/omr/blob/clang-format-webkit-column-limit/thread/osx/priority.c#L139-L141 Looks better!

charliegracie commented 7 years ago

I think clang does not modify the contents of comments by default and their may be an option to reflow them. I will look it up today.

rwy7 commented 7 years ago

The comment reflow option is already enabled in the WebKit style. Changing it to ReflowComments: false does not fix the formatting.

rwy7 commented 7 years ago

I see that include sorting is turned off. I am guessing that breaks us somewhere, however If we want include sorting, I suggest the following configuration.

The order is omrcfg.h, main header, local headers, system headers, and finally *.inc source stubs. Each category is alphabetized. The order is designed to cause compilation failures due to missing #includes in header files.

Note that clang-format will not sort includes across #if ... #endif blocks or blank lines.

IncludeCategories:
  - Regex: '^omrcfg\.h$'
    Priority: -1
  - Regex: '^(<|").*.inc("|>)'
    Priority: 4
  - Regex: '^".*"$'
    Priority: 2
  - Regex: '^<.*>$'
    Priority: 3
mgaudet commented 7 years ago

Interesting! I didn't know we could do that :smile:

I'm not 100% sure whether or not we still need to avoid sorting includes for the compiler; I am not recalling why I had that setting, but I'd much prefer replacing it with a curated priority list if we have a reason.g

rwy7 commented 7 years ago

Re ColumnLimit: I think It's reasonable to enforce a line length limit. The point of using ColumnLimit: 0 was to leave wrapping up to the discretion of the developer, not to have unlimited line length. Maybe people see it differently, but using an unlimited line length doesn't help us, it just puts a big hole in our standardization.

Plus it means the formatter is going to "just work".

charliegracie commented 7 years ago

I am in favor of having no line length limit as my screen is really wide and I dislike having a line broken up. I prefer to scroll if I need to but I feel I am probably in the minority. I could easily be swayed if it makes the code look better / formatting easier.

fjeremic commented 7 years ago

I too am in favor of unlimited column width as wide screen monitors are quite common so any column limit would effectively be arbitrary (to some degree). It seems the ColumnLimit: 0 option is only affecting the multiline comments, a problem easily solved by a search replace on tabs with spaces.

Would it be possible to format everything with a very large ColumnLimit first so as to get rid of the tabs, and then re-format the already formatted code with ColumnLimit: 0? Would this fix the problem?

mgaudet commented 7 years ago

I tend to actually support a line width limit.

IMO, the old 80 column rule is -too strict-, because monitors have gotten wider, but, unlimited is equally bad.

rwy7 commented 7 years ago

This is my suggested clang-format:

---
Language:        Cpp
BasedOnStyle:  WebKit
ColumnLimit:     120
IncludeCategories:
  - Regex: '^omrcfg\.h$'
    Priority: -1
  - Regex: '^(<|").*.inc("|>)'
    Priority: 4
  - Regex: '^".*"$'
    Priority: 2
  - Regex: '^<.*>$'
    Priority: 3
NamespaceIndentation: None
Standard: Cpp11
...

On my screen the 120 character limit works well. People should look at how WebKit wraps function parameters and if statements.

I have two new settings. The first is to change NamespaceIndentation: Inner to None. It's just simpler and saves some columns. The other is to use the C++11 standard (if we are able). This primarily affects (improves) the spacing of nested templates. EG:

Vector<SharedPtr<Vm> > vmList; // C++03
Vector<SharedPtr<Vm>> vmList; // C++11
fjeremic commented 7 years ago

Vector<SharedPtr > vmList; // C++03 Vector<SharedPtr> vmList; // C++11

Unfortunately xlC will not like the C++11 syntax and will fail to compile :disappointed:

dsouzai commented 6 years ago

This issue hasn't received any love for about a year. Now that we have a decent setup in terms of jobs running on a PR, perhaps having an automatic code formatter job on modified files in the PR would be useful? That way we don't need a massive commit resulting in tons of merge conflicts, but rather, the code will slowly but surely approach a consistent coding format.

rwy7 commented 5 years ago

Well, once again, I’d like to open the discussion around adopting a new universal code format for OMR. Now that we’re into the winter slump of activity in OMR, it might be our best chance to reformat the code base and minimize the negative impact on developers.

There is an ongoing discussion in #committers-public, where we are finalizing a coding format and clang-config. Based on the above discussions, we’ve started with a webkit style. The focus now is to iron out smaller issues where the defaults don't work. If you have any opinions or concerns, please voice them (here or on slack). I’ll update this issue when the format is semi-finalized.

I know many developers may already be on vacation, so depending on participation, it might be best to pause the discussion until the new year. I don't want this to fly under the radar. But, if we’re going to take advantage of the timing, it makes sense to move quickly. If we can reach a consensus, I’d like to start working on an adoption schedule soon.

rwy7 commented 5 years ago

I've opened PR #3390, which adds a clang format to OMR, in a webkit style.