R3BRootGroup / R3BRoot

Framework for Simulations and Data Analysis of R3B Experiment
https://github.com/R3BRootGroup/R3BRoot/wiki
GNU General Public License v3.0
18 stars 102 forks source link

System of units #783

Open klenze opened 1 year ago

klenze commented 1 year ago

ROOT says:

Selecting the System of Units in ROOT Historically the system of units in ROOT was based on the three basic units centimeters, seconds and GigaElectronVolts. For the LHC era in Geant4 collaboration decided that a basic unit system based on millimeters, nanoseconds and MegaElectronVolts was better suited for the LHC experiments. All LHC experiments use Geant4 and effectively adopted this convention for all areas of data processing: simulation, reconstruction and data analysis.

I think following the lead of the LHC experiments here and adopt the Geant4 units. This has the additional advantage that we can use Geant4 unit prefixes.

I think writing

(Of course, it what would be even better would be dimensionality checking at compile time (so that 20*cm+1*MeV or hist->Fill(hit->GetEnergy()) will not even compile, but this would require more work.

klenze commented 1 year ago

Of course, it what would be even better would be dimensionality checking at compile time

See here for my proof of concept implementation, or here for a version which is much more likely to end up in the C++ standard :-)

klenze commented 1 year ago

Also consider CALIFA: experimental calibrations always seem to be keV, while simulations are GeV.

inkdot7 commented 1 year ago

My 25 cents (value+unit) would be to use the Geant4 style of doing units. It is not too hard to work with, and not very verbose (only if one would start to spell things out, like kiloelectronvolt instead of keV).

It also provides small inline comments about the unit, without comment delimiters. And since quantities do have both a value and a unit (unless dimensionless), bare-word 20 is actually incomplete. 20*MeV reads better. (20 MeV of course reads even better...)

land02 uses some compile-time type-system for units (very crude though). That turned out to be very clumsy, and a real hindrance all over the place. Based on that experience, I'd recommend to go for the not-perfect (i.e. no static compile-time checking) Geant4-style, which still should get things much closer to easily checked code than having no unit specifiers at all. At least it is a starting point, much better than bare-word unitless values, and should be hard to argue against introducing.

Another nice thing with the Geant4 style is that it does not really hard-code the basic units used internally. E.g. 1 mm could be changed to 1 m without having to change anything but the definition. All other code just needs a recompile. It also does not force a particular base in print-out or plotting. Since one anyhow has to (or at least should!) do / keV or / MeV, then even things like / (10 * MeV) work and the user-visible units are decoupled from the internal representation.

And the Geant4-style should work rather well in a C-code as well, given a suitable header.

hapol commented 1 year ago

Geant4-style units are cool for me. It would be better to have them in the framework (with external maintenance) instead in a flavour (our r3b flavour). Any volunteer to perform the modification? It would require the revision of several thousands of code lines to adapt, in many classes ...

inkdot7 commented 1 year ago

It can hopefully be done in pieces - as long as thing stay consistent. One problem with not having the compiler nag about things, is that the chance for mistakes is not much reduced. And even increased during the transition, which may take a long time.

One way to at least hint someone reading code would be to use some typedefs:

typedef Double_L_t Double_t;  // A length.
typedef Double_T_t Double_t;  // A time.
typedef Double_E_t Double_t;  // An energy.
typedef Double_M_t Double_t;  // A mass.
typedef Double_angle_t Double_t;  // An angle.
// Etc...

This would not catch any errors, but at least act as a note to a user that the variable is to be handled with the CLHEP units.

A way to try to make consistent changes is to do just start with one or a few variables at a time, and then also whatever things interact with it. Interactions can be found by temporarily (i.e. during edit+compile cycle, but not committing) changing names of the affected variable and function, e.g. Double_t fDeltaAzimuthal; becomes Double_t fDeltaAzimuthalxx. Then the compiler would at least help to find the uses. And then change the names back before committing. That is a awful lot of changes, but probably worth the effort. Will not help with code which is not compiled, like macros. Unless they are all tested as well... If the CI tests would do some .L macro.C+ of all macro files, then perhaps such things would be managable as well.

YanzhaoW commented 1 year ago

One way to at least hint someone reading code would be to use some typedefs

There is user defined suffix operator in C++ (check this). And we can write values like: 1_m, 2_ns, 3_MeV

klenze commented 1 year ago

My 25 cents (value+unit) would be to use the Geant4 style of doing units. It is not too hard to work with, and not very verbose (only if one would start to spell things out, like kiloelectronvolt instead of keV).

At least SI units do not suffer that much from inflation :-)

I meant "verbose" (at the level in my original comment) a good thing, actually. I agree that having to spell out keV would be overly verbose.

I think it would be helpful to distinguish enforcing dimensional analysis within classes/functions and at interfaces.

Within a class/function, the spatial, temporal and interpersonal difference between the setting a double in some unit and using it is typically small:

void f()
{
    double x=42.5; // length in mm
    ...
    (use of x)
   ...
};

Also, there are all sorts of ways to calculate the wrong result which would not be detected by dimensional analysis, so the function has to be manually checked anyhow. Forcing devs to convince the compiler that all the units match will waste a lot of time for little benefit.

On the other hand, the distance between the caller of a method and the method can easily span multiple software projects and decades. The person writing the call might not even have met the author of the called code. Enforcing units in such a context seems like a sensible precaution.

To that end, I would propose having two namespaces. strong_unit::length would be the type of mm, parsec and so on. The other one would work like this:

namespace weak_unit
{
template<<strong_unit::length base=mm>
using length=...;
};

strong_unit types would be implicitly convertible to corresponding weak_unit types (with the value getting automatically adjusted by division through base).

weak_unit types would be implicitly convertible to double. This means that we could write

auto f(weak_unit::energy<MeV> en, weak_unit::length<cm> dist)
{
   // some calculations using en and dist as ordinary double values
   double res=...;
   return res*keV; // return strong_unit::energy
}

This would allow us to:

There is user defined suffix operator in C++ (check this). And we can write values like: 1_m, 2_ns, 3_MeV

(Mostly tangential rambling following)

I have to say that I am bitterly disappointed that the suffix has to be explicitly spelled out in the definition. The very least they could have done is to allow template<SomeClass str, OtherClass suffix > ThirdClass operator "" "" (); and have the user take care of parsing literal and suffix. As it is, we would have to define all of the SI (prefix, unit) combinations by hand. This shyness does not become C++, which has traditionally answered the question "would you like to buy this feature at the cost of greatly increased complexity?" with "we will take three versions". (The saving grace is that C++ 20 at least it allows (limited) template support for literal conversion operators, which means that SFINAE type variable conversion such as

template<Number x>
std::enable_if_t<(x%2)==0, EvenNumber> operator ""x_(){return x;} // and opposite for OddNumber
static_assert(std::is_same_v<decltype("100"_x), EvenNumber>);

works. This opens up the possibility of building a _pdg prefix. Instead of having to refer to a class proton by name we could just use the literal 2212_pdg. )

Just kidding. Using 42.0_m over 42.0*m has the advantage that the result does not depend on a variable called m being in scope or not, so it is probably a good idea. We can simply use the preprocessor to generate all the SI prefixes. Not.

YanzhaoW commented 1 year ago

@klenze

Do you think following usage is good?

auto length = 1_m;
auto speed = 1_m/1_s;

auto area = length*length;
auto area2 = 1_m*1_m;

With suffix operator, I don't think we can do something like "1_m<2>".

jose-luis-rs commented 1 year ago

This should be a proposal for FairRoot, we could then use it directly in R3BRoot.