R3BRootGroup / R3BRoot

Framework for Simulations and Data Analysis of R3B Experiment
https://github.com/R3BRootGroup/R3BRoot/wiki
GNU General Public License v3.0
18 stars 104 forks source link

Clang-tidy setting #828

Open YanzhaoW opened 1 year ago

YanzhaoW commented 1 year ago

I have added a static analysis using clang-tidy into our github CI/CD test in the PR #825. The current version of clang-tidy is 15 and the check list can be found here.

This issue is dedicated to discuss what should be enabled and disabled in the file .clang-tidy.

If anyone thinks some checks are unnecessary or hard to be fixed, please make a new PR about the .clang-tidy file in the project root directory or leave a comment below.

tips:

  1. If you go to Checks tab on the top of PR webpage and click static analysis on the left panel, you can see a list of warnings in your submitted code.
  2. ~If you place //NOLINT behind the line of code, clang-tidy will not check the line.~
YanzhaoW commented 1 year ago

@klenze

modernize-use-trailing-return-type

I suggest we should keep this feature and make the trailing return type as the default for the upcoming code with following reasons:

  1. readability In my opinion, it does increase the readability, especially for the functions with long return type:
    
    Neuland::Points Neuland::GetPoints() const;

auto Neuland::GetPoints() const -> Points;


2. Consistency
And if you use auto a lot for the variables:
```cpp
auto vec = std::vector<int>{};

const auto i = 3;

const auto points = GetPoints();

I would say for the sake of consistency, it's also better to declare or define functions as:

auto myFun() -> int;

auto myFun() -> int { return 0};

Trailing return type has been existed for 12 years (first introduced in C++11) and is definitely not latest trend in C++ community. Anyone who doesn't know this can learn this in few seconds. Just put the auto in the front and put the type behind the function name. It costs almost nothing to learn and only gives nothing but good in return.

cppcoreguidelines-owning-memory

We discussed that ROOT cannot allow us to have a clear ownership to the TH1 objects. So what we can do is to enable the LegacyResourceProducers option, which allows the usage of new operator but still forbids the usage of delete. If someone decides to delete some pointers, use std::unique_ptr.

Maybe we can define some factory functions to mitigate the new problem from ROOT, something like:

// instead of
auto* hist1 = new TH1D("hist1", "hist1", 10, 0,10);
//use
auto* hist1 = UseRoot<TH1D>("hist1", "hist1", 10, 0,10);

A very quick thought about the implementation of UseRoot: (for C++ novice, please ignore following code block)

template<typename Type, typename ...Args>
inline auto UseRoot(Args&& ... args)
{
    return new Type(std::forward<Args>(args)...); //NOLINT
}

With this, we can also get rid of "new" totally in new code.

We are not a C++ shop where every contributor can be expected to stay up to date on the latest trends in C++.

The clang-tidy checking is only applied with C++17, which is 6 years ago and most of suggestions date back to 12 years ago. I know 12 years in the physics (science) community means nothing. But it is like a whole new era for (software) engineering community.

This policy was not good for clang-format, and it will be disastrous for clang-tidy

Yes, I agree. For the legacy code, there is very little we can do. Clang-tidy only checks the changed lines. So we could set the bar a little bit higher for the new committed code.

The check is bugprone rather than a matter of taste.

I would say most checks in clang-tidy are either for bugprone or readability. A matter of taste is something like east const or const west, which I also feel unnecessary to prefer either one of them.

clang-tidy probably will not check macros

It won't check whether the C macros are used correctly. It just disallows the macro usage at all with cppcoreguidelines-macro-usage, which I totally agree. See this post for many setbacks of macros. I haven't seen any case where macro cannot be replaced by other proper functions or templates.

There are still a lot to do with our CI test, like adding more compiler options and compiling against both gcc and clang. More checks are alway preferred. Also the CI test should require fresh compilation and test of R3BRoot rather than compiling on previous built binaries.

Btw, did you write the hard coded check-options in .clang-tidy one by one? And for many options, I can't even find them in clang-15 documents.

klenze commented 1 year ago

I am still unconvinced with regard to trailing return types. auto SetRange(int range) -> void; does not seem to increase readability. From my understanding, the core guidelines do not have a position on trailing return types?

I just looked randomly at trending C++ projects on github, and it seems to me that neither googletest, doxygen or rapidjson are using trailing return types everywhere. So the leading return type did not generally fall out of favor like the K&R function declaration did.

I would suggest that in a particular header file, it is probably a good idea to be consistent. (Also, good luck convincing our maintainers. I remember making the case for having dev as the default branch, which was a lot stronger ("ROOT develops on the default branch, five random big-name open source projects do so") and still did not convince our maintainers for years.)

When I was talking about macros, I meant ROOT macros, not preprocessor macros (which are also bad but less common in our codebase).

Clang-tidy only checks the changed lines. So we could set the bar a little bit higher for the new committed code.

I think we should have two configurations of clang-tidy.

YanzhaoW commented 1 year ago

@klenze I think it's ok to just to write void SetRange(int range) and clang-tidy won't complain about it. But it's also ok for me to disable return trailing type. It's just a matter of syntax after all.

When I was talking about macros, I meant ROOT macros, not preprocessor macros (which are also bad but less common in our codebase)

Ok, then I will disable the clang-tidy check on ROOT macros ( by disabling checks on any .C file).

I think we should have two configurations of clang-tidy.

Well, I'm kind of reluctant on making checks or changes for the old code. First, there are just too many. Second, the current code design makes the program very unstable and any small innocent change could be very dangerous. For many times when I fixed some bugs, I was also surprised that the code could even run successfully before.

I agree the two things you mentioned are important. But for now, I would only wish the new code to be committed could uphold some standard suggested by clang-tidy.

What do you think about the factory function for ROOT owned objects? If you agree, what kind of name should we give it, "UseRoot", "CreateRoot" or "RootNew" etc?

YanzhaoW commented 1 year ago

@klenze @jose-luis-rs

Maybe we can discuss the magic numbers here. Currently checks on all floating-point values are disabled. And checks on integers, "0, 1, 2, 3, 4", are also disabled.

So should we disable all magic number checking? Or should we add more integers on the exception list?

Following are the options we can manipulate about clang-tidy magic number checking.

IgnoredIntegerValues Semicolon-separated list of magic positive integers that will be accepted without a warning. Default values are {1, 2, 3, 4}, and 0 is accepted unconditionally.

IgnorePowersOf2IntegerValues Boolean value indicating whether to accept all powers-of-two integer values without warning. Default value is false.

IgnoredFloatingPointValues Semicolon-separated list of magic positive floating point values that will be accepted without a warning. Default values are {1.0, 100.0} and 0.0 is accepted unconditionally.

IgnoreAllFloatingPointValues Boolean value indicating whether to accept all floating point values without warning. Default value is false.

IgnoreBitFieldsWidths Boolean value indicating whether to accept magic numbers as bit field widths without warning. This is useful for example for register definitions which are generated from hardware specifications. Default value is true.

IgnoreTypeAliases Boolean value indicating whether to accept magic numbers in typedef or using declarations. Default value is false. IgnoreUserDefinedLiterals Boolean value indicating whether to accept magic numbers in user-defined literals. Default value is false.

YanzhaoW commented 1 year ago

@inkdot7 If you have any issues about clang-tidy, you can report here.

It's very strange that clang-tidy gives an error while the gcc compiler doesn't. I will look into this later today.

klenze commented 1 year ago

I was just browsing through the clang-tidy messages of @AndreaLagni PR here.

Some comments.

Stuff I disagree with:

Stuff that should have an auto-apply option (like our format script --autofix):

Stuff which should give warnings, but not block merging:

Stuff which should block CI:

Stuff which should only be included after we fix the existing code base:

YanzhaoW commented 1 year ago

Thanks for the comments. I won't object most of your suggestions. Considering the possibility that clang-tidy can have different configurations under different folders, I assume your suggestions are meant for the file R3BRoot/.clang-tidy.

Stuff that should have an auto-apply option (like our format script --autofix):

I generally don't like the idea that CI process could automatically change the submitted code. Instead CI should just be checking and testing without altering anything. If people want to use the autofix feature of clang-tidy to fix the warnings quickly, they could do it by themselves using some tools before submitting their code.

Stuff which should give warnings, but not block merging:

I'm not sure how we could do this in practice. Of course we could use some python scripts to do some filtering of those warning messages. But personally I lack the motivation because I would still prefer fix the warnings as much as we can. If someone has this motivation, please submit a pull request.

Stuff which should block CI:

There are other vicious things apart from the C-style casting: like shallow copy, narrowing conversion (such as double to int in the TofD PR), uninitialised variables etc , which do affect our final analytic results. For me, the more, the better. But I'm ok if most of people want less checking from the root configuration (I would probably add them back under neuland related folder ;D).

This looks misleading, there is no gsl::owner in sight. If you manage to convert R3BRoot to smart pointers and it passes for everything, you can include it.

I think this warning is the only one that doesn't have a very clear message for C++ novices. But personally I think getting rid of "new" and "delete" is very important if we want less segmentation fault. And I'm sceptical that we could ever fix the existing code while we let more new code without any object ownership keep merging into our code base.

Nevertheless, if you think the restrictions should be lessened or altered, feel free to submit a PR to change the setting.

inkdot7 commented 1 year ago

There are clang-format errors reported for #712 for parts of the file which the commit does not touch.

https://github.com/R3BRootGroup/R3BRoot/actions/runs/4898110038/jobs/8746889182?pr=712

The PR only touches these lines:

https://github.com/R3BRootGroup/R3BRoot/commit/e16e9d534a8ffd0d66a22eb196deef7010eaf478

Is it the intention that commits are littered with unrelated white-space changes?

YanzhaoW commented 1 year ago

@inkdot7 Have you rebased your PR to the HEAD of the dev branch?

inkdot7 commented 1 year ago

@YanzhaoW yes, I believe so.

inkdot7 commented 1 year ago

I generally don't like the idea that CI process could automatically change the submitted code. Instead CI should just be checking and testing without altering anything. If people want to use the autofix feature of clang-tidy to fix the warnings quickly, they could do it by themselves using some tools before submitting their code.

I very much agree that CI should not automatically change submitted code. In fact, if it is able to modify code inside e.g. a inkdot7-owned repository, I would be deeply unhappy with github! That would be a very severe security issue!

CI must be read-only w.r.t. all repositories.

inkdot7 commented 1 year ago

Just found (my repo):

under Settings -> Code and automation -> Actions -> General -> Workflow permission -> 'Read and write permissions'

! Yikes !

Changed to 'Read repository ...'

This had happened by some default it seems.... github - you just lost a lot of trust!

jose-luis-rs commented 1 year ago

There are clang-format errors reported for #712 for parts of the file which the commit does not touch.

https://github.com/R3BRootGroup/R3BRoot/actions/runs/4898110038/jobs/8746889182?pr=712

The PR only touches these lines:

e16e9d5

Is it the intention that commits are littered with unrelated white-space changes?

But the problem is related to the clang-format, just apply clang-format-15 to the modified file

inkdot7 commented 1 year ago

But the problem is related to the clang-format, just apply clang-format-15 to the modified file

I did not modify those parts of the files.

Generally, when making a pull request, I would expect it should be as concentrated as possible? So as to not distract the review process.

Should I first make a PR that cleans up any whitespace changes in the files the 'real' PR needs to touch?

inkdot7 commented 1 year ago

I think the reasonable (i.e. user-friendly) way this will work is if the entire repository is 'clean' w.r.t. the CI tests. Clean = pass. Then the CI can enforce formatting rules, as only new code can break it.

Asking user to gradually clean up code here and there is not good.

So for the moment I would then suggest to put the equivalent of an || /bin/true in the formatting CI test, until the repository is cleaned. When that is done, CI can start to enforce things. Trying to force other users to do the cleanup job is not productive. Those who want those formatting rules first need to push them (i.e. all the actual formatting changes) through the usual PR process for the entire repository.

jose-luis-rs commented 1 year ago

Upps, we didn't apply the clang-format-15 to all the files. Well I can take care of this but, @YanzhaoW, I will not solve all the clang-tidy issues.

YanzhaoW commented 1 year ago

I very much agree that CI should not automatically change submitted code. In fact, if it is able to modify code inside e.g. a inkdot7-owned repository, I would be deeply unhappy with GitHub! That would be a very severe security issue!

No, I don't think Github can do this unless we use a permissive Github token in the workflow.

I think the reasonable (i.e. user-friendly) way this will work is if the entire repository is 'clean' w.r.t. the CI tests. Clean = pass. Then the CI can enforce formatting rules, as only new code can break it.

@inkdot7 My personal opinion is if we don't enforce the rule that the new code must have the correct format or style, the entire repository could never be clean.

@jose-luis-rs ./util/clang-format-check.sh should check the format of only changed lines. Maybe there is something wrong in the shell script?

jose-luis-rs commented 1 year ago

I very much agree that CI should not automatically change submitted code. In fact, if it is able to modify code inside e.g. a inkdot7-owned repository, I would be deeply unhappy with GitHub! That would be a very severe security issue!

No, I don't think Github can do this unless we use a permissive Github token in the workflow.

I think the reasonable (i.e. user-friendly) way this will work is if the entire repository is 'clean' w.r.t. the CI tests. Clean = pass. Then the CI can enforce formatting rules, as only new code can break it.

@inkdot7 My personal opinion is if we don't enforce the rule that the new code must have the correct format or style, the entire repository could never be clean.

@jose-luis-rs ./util/clang-format-check.sh should check the format of only changed lines. Maybe there is something wrong in the shell script?

The script only looks for the changed files since users should apply the clang-format over the full file, not only the modified line.

inkdot7 commented 1 year ago

Please have a look here:

https://github.com/R3BRootGroup/R3BRoot/actions/runs/4902871432/jobs/8754973368?pr=859

so clang-format takes code which is white-space formatted for readability and turns in into junk. (Adding exceptions I will not spend time on.)

klenze commented 1 year ago

I generally don't like the idea that CI process could automatically change the submitted code. I agree with that, and that was not my suggestion.

My personal workflow is roughly like this:

0. change stuff
1. make a commit 
2a. do a ./util/clang-format-check.sh --autofix
2b. check the git diff, if reasonable then
2c. do a git commit --amend
3a-c. do 2a-2c for clang-tidy
4. do a push & make a PR

In the step 3a, I could tolerate being asked if I want to make something const and other nitpicky-put-potentially-correct things. (Technically, I should probably check vs clang-tidy, then check vs the tests, then check vs clang-format in that order, but you get my point.)

I think the reasonable (i.e. user-friendly) way this will work is if the entire repository is 'clean' w.r.t. the CI tests.

This seems obvious to me. The way to clean up a codebase is one check at a time (like I did for unsafe casts).

Also, badness (like evilness) of code is a non-trivial semantic property, so Rice's theorem applies. Meaning that no program (or human) can generally decide if a PR contains bad code or not. :-)

More seriously, while I think that some serious errors can (and should) be caught by clang-tidy, this should not distract from the fact that plenty of errors appear much earlier. From a maintainability point of view I think the more serious flaws actually happen in the design phase, in particular with regard to the data processing stages. Perhaps at some point we will have clang-llm-review to get good auto-generated code reviews, but for now LLMs are not quite there yet.

Hammering our code into good shape with clang-tidy alone is likely not going to work better than repairing a car engine by hammering away its brokenness.

ryotani commented 1 year ago

I need your help concerning #866. I made several small modifications in the code but then it requires me to re-write the code due to the clang-tidy. I can understand the importance to maintain codes in good condition, but I feel it's too much for requiring users to satisfy all the conditions by clang-tidy beyond the lines they made modifications. For instance, I am a bit too much against "cognitive complexity" since it requires us to modify the structure of the algorithm completely. I don't think it's a good idea to apply these requirements for any modifications in the existing and running codes. Another problem is that I don't find a proper running script to check the clang-format and clang-tidy before making PR. Even though I execute util/clang-format-check.sh --autofix in lxir136, the position of parenthesis doesn't match with the requirements in the GitHub. Also, I don't find any script as clang-tidy-check.sh. Unless such utilities are ready to run in local environments (at least in lxir computers), I don't think we should not require them for the contributers.

YanzhaoW commented 1 year ago

Hi @ryotani

"cognitive complexity" is used to prevent people from creating functions with hundreds of lines of code. The more if and for statements there are, the more "cognitive complexity" you get. Those functions are neither maintainable nor readable. And we have already a lot in our existing code base. But if you didn't create the whole function and just modify the signature(, which enables clang-tidy to give it a check), I suggest you could do

void yourfunction(...) // NOLINT(readability-function-cognitive-complexity)

Then it will suppress clang-tidy to give rise to this type of error for this line of code.

Yes, there is a way to know the clang-tidy warnings while you do the code editing. I introduced an IDE once in ECR meeting. It's called 'NeoVim' (very similar to Vim) with clangd embedded. And it works everywhere. Please have a check on this.

If you need more help, please contact me via discord.

klenze commented 1 year ago

So you added a reasonable if statement to make R3BCalifa less noisy, which resulted in clang-format complaining about the indentation of a lambda expression in a different part of the file, which you fixed. Then clang-tidy saw that as new code and bitterly complained about the length of some variable names. Sounds like CI-hell.

In the end, clang-tidy should reflect a consensus of the devs. Ideally, it would simply reflect some features of our design guidelines, but apparently nobody wants to talk about design guidelines (#838).

I have argued for having two clang-tidy instances running. One in "enforcement mode" (passing over all the code base): if code is downcasting using C-style casts, it should not be merged. The other as "advice mode", where it can nag about complexity, but where output does not block merging.

First adding clang-tidy checks and then adding NOLINTs seems to be a clear case of https://xkcd.com/2677/ . The easier way to go about it would be to directly disable it in the clang tidy configuration. Our present configuration does not exactly represent an armistice compromise achieved after years of struggle, with open conflict looming whenever anything is changed, but mostly is what @YanzhaoW felt would be good to have enabled when he implemented the clang-tidy CI task.

Even though I execute util/clang-format-check.sh --autofix in lxir136, the position of parenthesis doesn't match with the requirements in the GitHub

That is unfortunate. Apparently our version of clang-format is different from the CI version. I have installed a more recent version in /u/land/fake_cvmfs/10/llvm/inst_15.07/bin. Can you try if util/clang-format-check.sh --autofix /u/land/fake_cvmfs/10/llvm/inst_15.07/bin/clang-format works better? There is also a clang-tidy in there, btw. (I also tried to make the clang-format on CI generate a diff file, you can still see parts of the output. Originally I uploaded the output to some pastebin, but this has been disabled?)

YanzhaoW commented 1 year ago

readability-identifier-length should be highly context dependent. What is ok in a small scope might not be reasonable in a larger scope.

Adding two more letters doesn't really hurt and it doesn't do any harm but only make the code more readable.

readability-function-cognitive-complexity is not wrong but not really actionable. "Your method is to long, split it up in multiple methods" is not as helpful as "this block of code only depends on two outside variables, have you considered moving it to its own method?" is.

Yeah, agree. But I have never known there is way to change the warning message. If you know how to do this, please submit a PR.

hicpp-braces-around-statements,readability-braces-around-statements is imho wrong, the whole point of putting '{' in its own line is that we can use block ifs and command ifs because they are visually distinct.

What is the difference between block ifs and command ifs? Sorry I've never heard about this terminology.

cppcoreguidelines-pro-bounds-constant-array-index and cppcoreguidelines-avoid-c-arrays are also misleading. Cascading std::array is a terrible way to implement multi-dimensional arrays, even more so if you use cascaded range-based for loops instead of indices.

Again, I agree the warning message is not super clear. Another example is using new operator for the heap memory allocation. But I wouldn't say we should disable the check because the warning message is not clear.

I think our best option is to have the compiler generate runtime boundary checks for the subscript operator.

Runtime boundary checks hurt the runtime speed, which doesn't matter too much in a CI workflow. Then you have to force people, who are not even willing to add few letters to the variable names, to additionally write the whole unit test for it.

While I would wish people switched to boost MultiArray

My opinion is the code with multidimensional arrays already shows a bad design.

One simple example can be how to refer to a PMT in Neuland? You could just do like this:

const auto pmt = NeulandPMT[plane_id][bar_id][pmt_id];

Or you could have a better OOP design:

class NeulandBar{
   private:
     std::pair<NeulandPMT> pmts;
};

class NeulandPlane{
   private:
      std::vector<NeulandBar> bars_;
};

class NeulandDet{
   private:
      std::vector<NeulandPlane> planes_;
}

And then add member functions to retrieve specific members, perform certain tasks and so on (and using iterators!). you get the ideas. Of course, you need to take additional care to make this a little bit cache friendly (a.k.a data-oriented way) to reach the maximal efficiency. I could be totally wrong, but we can talk about this more.

In the end, clang-tidy should reflect a consensus of the devs.

Always keep in mind that you can have different clang-tidy settings for different folders. (yes I add trailing return type back in /neuland/ :D ) Yeah, we can have a consensus of which checks must be enforced throughout the whole program. And every one/group can make their own decision about the rest of checking.

First adding clang-tidy checks and then adding NOLINTs seems to be a clear case of https://xkcd.com/2677/ .

I wouldn't say so. The reason of doing this is because clang-tidy is just a static-analysis tool, not an AI assistant. I can't tell clang-tidy that "Hey, if this is a c struct, don't check on the member variable initialisation." or "If someone just changes the signature of the function, don't perform the complexity checking." or "if it's just clang-format change, please do nothing.". Maybe in the future, llvm could come up with some more intelligent checking tools. But for now, I would just save my time and either fix the warnings or put NOLINTs behind some special cases.

Our present configuration does not exactly represent an armistice compromise achieved after years of struggle, with open conflict looming whenever anything is changed, but mostly is what @YanzhaoW felt would be good to have enabled when he implemented the clang-tidy CI task.

My idea was to use the default settings given by clang-tidy (everything enabled). And if you feel some checks are not necessary, you can disable them under your own folder.