Clang-formatting - what should be included, what should be removed, future additions

R3BRootGroup / R3BRoot

Framework for Simulations and Data Analysis of R3B Experiment

https://github.com/R3BRootGroup/R3BRoot/wiki

GNU General Public License v3.0

18 stars 104 forks source link

Clang-formatting - what should be included, what should be removed, future additions #845

Open ajedele opened 1 year ago

ajedele commented 1 year ago

The clang-tidy is currently used to check the code submitted to GitHub. There are several features that are of great benefit and some that are more cumbersome or above the skill-level of average R3BRoot user.

In lieu of the conversation from last Thursday, the point of this post is to start the conversation around the clang-format.

In my opinion, I think there are a few easy things that make sense:

removing the Root nomenclature for variables (double vs. Double_t)
using c++ libraries when possible (for example std::vector vs. TVector, array vs TArray)

When it comes to Root classes, we need to be highly realistic. Root will be the staple of our coding until CERN stops supporting it. Therefore, the clang-tidy should not 'punish' users for using TF1s, TH1Ds, etc.

YanzhaoW commented 1 year ago

Thanks for raising this issue.

or above the skill-level of average R3BRoot user.

There are R3BRoot users and there are R3BRoot developers. We shouldn't require the users to have high-level skills, or in other words, software should be easy to use. But like in almost all cases, the lower skill is required to use the software, the higher skill is needed to develop it. Of course we can make it very easy to develop and very hard to use. That's a matter of the choice.

Clang-tidy just helps developers to keep that level not too low.

Therefore, the clang-tidy should not 'punish' users for using TF1s, TH1Ds, etc.

I don't think clang-tidy 'punish'es developers for using those classes. Actually those classes are pretty useful when it comes to data visualisation in C++. The biggest reason why it complains is those classes are 'created' using new operator, such as:

auto* th1 = new TH1D("hist", "hist", 10, 0., 10.);

This way is pretty bad as it specifies no ownership at all. And the ownership of an object is very important if we want to get rid of segmentation fault everywhere in our program. I have talked about this for more than an hour in the R3B analysis meeting. You could check my slides if you missed that meeting. (I attached my slides with Valerri's announcement email of my talk.) Instead you are suggested to do:

#include <R3BShared.h>
auto* th1 = r3b::root_owned<TH1D>("hist", "hist", 10, 0., 10.);

With this, we can get rid of new and delete once for all.

akelic commented 1 year ago

I agree on this topic with @ajedele

@YanzhaoW : " There are R3BRoot users and there are R3BRoot developers. We shouldn't require the users to have high-level skills, or in other words, software should be easy to use. But like in almost all cases, the lower skill is required to use the software, the higher skill is needed to develop it. Of course we can make it very easy to develop and very hard to use. That's a matter of the choice."

In my opinion, what @YanzhaoW wrote would be correct if the code is a correctly (from the physics point of view) working "black box", where user just gives input files and the rest is done by the code. But, this is not the case. The "problem" is that developpers (which I admire all for their knoweledge, work and efforts!) have only experience with limitted number of detectors (namely, mostly Califa and NeuLAND). Consequently, changes they make on other detectors cannot be tested by them and may even not be the best option. Testing has to be done by users who are not on the same level of programming skills as developpers, but have much more knowledge and experinece about functioning and performance of their detectors and associated electronics. We have just now an excellent example of this situation with the TOFD detector. We had a code which was maybe not fulfilling the highest standard of the coding, but was fulfilling physics standards of data anaylisis. Now, we have a code which should fulfill the standards of coding, but doesn't have the same quality level when it comes to physics and data analysis of TOFD.

Thus, in the case of R3BRoot, in my humble opinion, the standard concept of developpers and users cannot be applied at the present. We could use this concept, only, and only if developpers sit down and invest lot of time and effort to learn how each detector and each electronic readout is working, how their signals are created, collected, analysed. After that, and only after that, they can be clasified as developpers and make highl-level coded anaylsis tool which we the rest (clasified as only user) would use as a "black box", and developpers will be the one who will implement and test and prove any changes needed to be done in the code. But, this is totally not realistic, and thus it is not possible to make this distinction between developpers and users in our case. And this means, that suggestion of @ajedele is very reasonable and have to be taken into account.

ajedele commented 1 year ago

Could you please define developer vs. user? In my opinion, these terms are used rather ambiguously by different people in our collaboration.

In regards to the new operator, I think you need to be willing to compromise there. I understand the latter example you gave is more correct and should be used if possible. However, if I want to use an example I find on the Root forums, they all have the new operator. This issue is a higher-up problem.

Another issue that was raised to me was numbers vs variables. For example, clang-tidy would flag auto* th1 = new TH1D("hist", "hist", 10, 0., 10.); 4 times! 1st for the new, and 3 times for 10., 0., 10. since these number are hard-coded and not variables. In the case of the Tofd, this is actually fairly well done with the histograms. The majority of the code has hard-coded values corresponding to the range of the ToT values based on the TDC range. For histograms that are zoomed-in, the range is set to fTotHigh(Low)Value (or similarly named). This should be the preferred way

YanzhaoW commented 1 year ago

For example, clang-tidy would flag auto* th1 = new TH1D("hist", "hist", 10, 0., 10.); 4 times! 1st for the new, and 3 times for 10., 0., 10. since these number are hard-coded and not variables.

Actually it's only 2 times. One for the new and one for the 10. All floating-point numbers are not considered as "magic numbers".

YanzhaoW commented 1 year ago

But, this is not the case. The "problem" is that developpers (which I admire all for their knoweledge, work and efforts!) have only experience with limitted number of detectors (namely, mostly Califa and NeuLAND)

In my opinion, it's totally fine for a developer to have experience only with limited number of detectors. According to the "division of labour", as the most fundamental principle in any collaboration, corporation or society, one person only need to know and do one part of the whole. The same philosophy can also be found in an old UNIX dogma:

Do one thing and do it well
Work together with other components

For example, I only work with NeuLAND detector. And that's enough for all my implementations on this detector and they do no have any influence on other detectors. In the meanwhile, it should also be easy for some other people to use these implementations together with those of other detectors. You could find this style everywhere in our softwares, like FairRoot and ucesb. Even in R3BRoot, different detectors are separated into different folders with individual implementations. But you could still make them work together for either the simulation or the experimental data deserialisation and analysis. For those people who need to analyse the data from multiple detectors, they should just know how to use those implementations and stay away from the implementation details.

Testing has to be done by users who are not on the same level of programming skills as developpers, but have much more knowledge and experinece about functioning and performance of their detectors and associated electronics.

Yes, that's why we should we together instead of letting everyone do everything.

We had a code which was maybe not fulfilling the highest standard of the coding, but was fulfilling physics standards of data anaylisis.

Personally I think the "highest" standard of the coding we could afford is:

It works right now
Easy to change
Modulisation
High coerence
Low coupling
Optimal implementation detail
Large coverage of unit test

The PR regarding TofD only satisfies the first of 7 and clang-tidy is just forcing it to touch the bare minimum of the 4th. I'm not quite sure about the "physics standard of data analysis". Could you elaborate more on that?

Thus, in the case of R3BRoot, in my humble opinion, the standard concept of developpers and users cannot be applied at the present.

The concept of developers and users is always there for every software even if they are the same group of people. I could save the time for the development but some other people will pay it back (or even more) whenever they start to use it and change it further.

YanzhaoW commented 1 year ago

Could you please define developer vs. user? In my opinion, these terms are used rather ambiguously by different people in our collaboration.

In case of NeuLAND, the users are those people who use R3BNeulandReader, R3BNeulandCal2Hit, R3BNeulandCal2HitPar, etc in their root.C files. The developers are those who wrote those classes. They could be the same group of people. I don't how different it's for other detectors.

I understand the latter example you gave is more correct and should be used if possible.

They are both correct. The example I gave you just makes the coding much simpler. With it, you don't need to think about where you should delete it or not. With new operator, you have to think about where you should delete them and whether you should delete them to avoid program crashing. Compared to wasting the time on whether I should delete it or not every time, it's much more efficient to just learn the ownership of the object. (of course, I could also ignore all of it and let other people waste their time to fix the crashing in the future.)

However, if I want to use an example I find on the Root forums, they all have the new operator.

I think those code are meant for marco (.C) files rather than source file (.cxx) for a library. Clang-tidy doesn't check on .C files.

This issue is a higher-up problem.

ROOT has a lot of bad designs. But that doesn't mean we have to pay the price of its bad designs until they are fixed upstream. We could get around it downstream by putting those bad designs we have to use in a wrapper and use that wrapper instead. That's what r3b::root_owned wrapper is for.

ajedele commented 1 year ago

Dear Yanzhao,

While, in principal, your ideas sound great and fall within the idea of 'division of labor', I would refer back to the comments of Aleksandra.

Take the tracking detector people as an example. If we said the experts will concentrate on 1 detector only and we will split these people into developers and users, then I hope you are ok with running experiments with the Tofd and LOS only. There would be no development on future fiber detectors, a PAS replacement or other upstream options.

CALIFA and NeuLAND have the personnel to divide up tasks. You have people who do simulations only, testing the detector and hardware only, experimental data analysis only. The tracking detectors are lucky if they have someone who is free enough to do a simulation for a detector. And by the time they become an expert, they have left.

For the tracking detectors, the designers, builders, operators, simulators and analyzers of the detectors/data are the same people. And now you want them to pause their work and spend at least 6 months learning proper C++? This is where the compromise comes in.

My other concern is if you would like to force people to use clang-tidy and program to a high-standard, you have to be willing to sit with people, learn their detector and teach them how to code. You have to assist with making sure the skill-level of the 'developer' is at your satisfaction and if that's not possible, you have to be ready to fix the code for them. Think about how much time you want to invest as well.

akelic commented 1 year ago

@YanzhaoW I aologize for repeating mysef, but at this stage we cannot have deveoppers and users, we only have developpers. Only very small amount of our detectors are ready. Most of them are still being built or tested; the same is for electronics. As a consequence, each person analysing data from some detector is a developper, as in each experiment we go with different detectors and/or different electronics. And most of these deveoppers (inlcuding myself) are well below your coding skills. Imposing rules with e.g clang-tidy, people are forced instead of working on detectors data to solve warnings. Some are very reluctant to submit any (and often important) changes, as their coding skills are not up to new demands. Please don't get me wrong, I agree that we have to have high-quality code, but this is not the time to do such beautifications nor to force people to do them. We have several experiments performed in last years, and we have lot of student working on analysis. As PhD students, they all have limited time to finilize their work. But, instead of working on their data, they have to follow new-imposed rules and solve asssociated problems which takes a lot of their time. Also, as the code is waiting to be accepted until the warnings are solved, this means that also other students who analyse the same experiment have to sit, wait and lose their precious time. Now is not good timing for such changes. Wait until we start to move into the R3B cave at FAIR. Hopefully, till then we will have most of detectors ready and thus will be able to have stable verisons of analysis, which will not be needed to be changed from one experiment to another. And then, you can start with your developper-user "division". But not now. Concerning TOFD, if I have on one side code which is working for any experiment and on another side code which is working only for two experiments, I think everything is said. By the way, this is not a critic to Enes, in fact I admire him for all the effort he made to make once working code working again, and I am sorry for the time he had to spant on these technical issues instead of working on his analysis.

YanzhaoW commented 1 year ago

@ajedele Thanks for your reply.

And by the time they become an expert, they have left.

Because of this issue and the shortage of the manpower for many detector, I would say, we need to do things more efficiently. And therefore a higher standard of the implementation code should be more desired.

For the tracking detectors, the designers, builders, operators, simulators and analyzers of the detectors/data are the same people. And now you want them to pause their work and spend at least 6 months learning proper C++?

Than's very unfortunate. Sorry that I didn't realise it.

My other concern is if you would like to force people to use clang-tidy and program to a high-standard, you have to be willing to sit with people, learn their detector and teach them how to code.

I wouldn't like to force people to have clang-tidy checking on the implementation code for their detectors. And most of the warnings (if not all) given by clang-tidy can be solved by googling. Actually it's also possible to lower the standard or even disable the clang-tidy checking for a folder, like R3BRoot/tofd, by creating another .clang-tidy file under that folder. I haven't tried this but it should be possible.

For example, if I create a config file R3BRoot/tofd/.clang-tidy, inside which:

Checks: -*

It will disable all the warnings for all the files under the folder R3BRoot/tofd/. But I strongly disagree we should do this on the folders like r3bbase, r3bsource, r3bgen and r3bdata which are used by everyone ( ofc except those detector specific folders under r3bsource and r3bdata).

Like I said, a "half-done" work (Bad implementation, no modularisation, no unit test, etc) cost less time in the begin and but cost much more time in the future when people want to change it. In the meanwhile, a "done" work cost more time in the beginning but much less time in total. I prefer the latter personally.

YanzhaoW commented 1 year ago

Hi, @akelic sorry to hear that you have a very limited time for so many things. As I would probably mention again in the next R3B analysis meeting, you have 3 ways to ignore the clang-tidy warnings:

Ignore warnings for a single line:
```
int i; // NOLINT
```

Ignore warnings for a block of code:

//NOLINTBEGIN
int i;
auto* j = new double{};
//NOLINTEND

Ignore the warnings for all the files in a folder: Suppose you want to ignore warnings for all the files in R3BRoot/tofd/, create a file R3BRoot/tofd/.clang-tidy under that folder. Inside the file, write:
```
Checks: -*
```
I haven't tried the third option. Please let me know if it doesn't work.