Closed YanzhaoW closed 3 months ago
Hi @kresan, could you give this PR a review about the build.py, a build script for R3BRoot, and using parameters without parameter factories?
Package Manager
From my understanding, the conan dependencies would not be automatically installed if people continued to run cmake, which is why you added the build.py.
I would be reluctant to wrap the top level of the build process. cmake and make both take a zillion optional flags, so it is worthwhile for users to know how to run them manually so that they have the option to pass some of these flags.
I am also doubtful that our lead developers will be easily sold on adding an extra step in the build process. Fetching external packages from cmake, while not particularly elegant, seems more likely to succeed.
External dependencies
One of the main benefits of add external dependencies in cmake is that it is painful, which dissuades people from adding to many. I don't know much about C++ json libraries (I am more of a yaml fan), but it seems to me that we already get a json lib with boost. I have not used boost json, it could be an utter horror to work with for all I know. But if it is reasonable, I would much prefer to use a boost library to an external dependency, even if it is less shiny. The reason is that someone might have to compile this code in a decade, and I am not sure if nlohmann_json or conan will still be maintained by then. (If boost is not maintained by then, that is FairSoft's problem).
With gtest, my understanding is that FairSoft ships with a version, but you would like a more recent version?
Class Design I had a look at MappedDataReader. I appreciate you trying to separate the ugly FairRoot glue code from code which does actual useful work, but I don't think I am convinced your model can scale into a general solution to the problem of handling FairRootManager. (Of course, you never claimed that, so this is a bit like "I just bought some groceries" - "You fool, that will never scale to solve world hunger".)
In theory, that class could be templated over MappedData to generate a reader for any r3b data container. However, this won't work to well. The main problem is that as soon as you need a task which reads from two different data containers, you would need multiple inheritance. Of course, an instance of Reader<T>
can not provide a Method Get##T
, because only the preprocessor can concat token strings, so it would at most provide a method T GetContainer(T* dummy)
. This in turn will fail if a class needs two instances (with different runtime string names in FRM) of the same data type, say WRTS from both MainDAQ and Neuland.
I guess if one found a way to smuggle strings into template parameters (lambda expressions do work), one could use template packs to do something like
class MainNeulandWRTSDifference:
public TemplateReader<DataInstance<WRTSData, "MainWRTS">, DataInstance<WRTSData, "NeulandWRTS">>
{
void Exec(...)
{
auto mainWrts=GetContainer<DataInstance<WRTSData, "MainWRTS">>();
auto mainWrts=GetContainer<DataInstance<WRTSData, "NeulandWRTS">>();
...
}
};
What I would instead prefer is the following:
class MainNeulandWRTSDifference : public BetterTask
{
auto fWrtsMain=FetchData<WRTSData>("MainWrts");
auto fWrtsNL=FetchData<WRTSData>("MainWrts");
auto fDiff=RegisterOutput<WrtsDifferenceData>("MainNLWrtsDiff");
auto fSomePar=FetchParameter<WrtsClock>("WrtsClock");
...
};
Because the objects can not actually be fetched when the class members are initialized, accessing them will involve an additional level of indirection. (The alternative is the ugly syntax of auto fWrtsMain=FetchData<WRTSData>(fWrtsMain, "MainWrts");
which would allow BetterTask to store a reference to fWrtsMain and assign it at an appropriate time.)
The implementation of BetterTask::FetchData will of course be a bit tricky. I would use a
std::vector<std::function<void(FairRootManager*)> fOnInit;
template<class T>
T* FetchData(T*& target std::stringview name)
{
fOnInit.push_back(
[name&, target&](FairRootManager* frm)
{
target=frm->InitObjectAs<T>(name);
// error handling
});
return nullptr;
}
(Ugly fast version without indirect reference, the other version would require more memory storage. Of course, it is probably possible to have a simple pointer-wrapping class which overloads operator= and copy constructor and tells BetterTask where the pointer it needs to update is stored. This would be actually neatly byzantine.)
Perhaps I will get around to writing a proof of concept for that at some point.
From my understanding, the conan dependencies would not be automatically installed if people continued to run cmake, which is why you added the build.py.
No really. The conan dependencies are only needed to be installed for the first time with:
conan install . --output-folder=build --build=missing
And then when you run with cmake, you need to add a cmake flag to let cmake know where are packages:
cmake .. -DCMAKE_TOOLCHAIN_FILE=Dir/conan_toolchain.cmake
For the second run or third run, you could just run cmake ..
since the package path is cached already. This is what is really nice about conan: it's not intrusive. You don't need to add or change anything in your cmake script to use conan. If conan is not support anymore (which is highly unlikely), we could just use another one without changing cmake script. The reason why there is a build.py is that I want to simplify the process when the project is built for first time. Actually it's also very simple after the first time build.
But you are right about this point. Use ./build.py -a
is different from the traditional workflow mkdir build && cd build && cmake .. && make
. However, a build script for a large C++ project is indeed very common. Some people choose make script, some choose bash script and some choose python script. All is just to make build process user friendly.
That being said, there is an ongoing development for conan2 (see this), which allow cmake automatically install all conan packages and cache the path. It's already done for conan1 and now they are trying to make it compatible with conan2.
I am also doubtful that our lead developers will be easily sold on adding an extra step in the build process. Fetching external packages from cmake, while not particularly elegant, seems more likely to succeed.
I wouldn't say they are fetched from cmake. There is a Fetch content function in cmake, which everyone in C++ community is complaining about. Conan, on the other hand, is similar to Spack they are using for FairSoft. The only difference is Conan is much much nicer to integrate with cmake than Spack. With Spack, you basically need to set the PATH env variable and tell cmake to go to find the Config.cmake with those paths. If someone set PATH differently, then it's screwed. But with Conan, cmake just need to use conan_toolchain.cmake, which specifies the package paths and the cmake targets directly. It's hardly surprising since Spack is more like dnf
or apt
which can install packages in many languages systemwide. Conan is the package manager only for C++.
I don't know much about C++ json libraries (I am more of a yaml fan), but it seems to me that we already get a json lib with boost. I have not used boost json, it could be an utter horror to work with for all I know.
Yes, I heard about boost JSON library. There is also another JSON library called simdjson. Among these three libraries, I personally think nlohmann_json is probably most easy to use. It has super nice API and the json data structure is much like any STL containers. Maybe I will also try boost.json. Since with a package manager, changing into a different library is very easy.
I wound't say yaml is very suitable for storing data. It's more like for configuration, instead of data storage. Actually I tried to use yaml to configure neuland executable using yaml-cpp (see here). But since then I just keep thinking: maybe python or lua is better to configure C++ project? Anyway, C++ JSON library is much more mature than yaml library and if you see this reddit post, most people use JSON.
I appreciate you trying to separate the ugly FairRoot glue code from code which does actual useful work, but I don't think I am convinced your model can scale into a general solution to the problem of handling FairRootManager.
Well, there wasn't MappedDataReader in the beginning. Later I found there are a lot of duplicated code between R3BNeulandMapped2Cal and R3BNeulandMapped2CalPar. Both need to read the mapped data, one for the calculation of parameters and the other for calculation of cal level data. Trying to follow the best practise, duplication is "evil". So in the end I just created a class to eliminate all duplications. This class is definitely not to solve the boilerplate problem from FairRootManager.
Because the objects can not actually be fetched when the class members are initialized, accessing them will involve an additional level of indirection.
I don't know how data is read from ucesb. But with TTree, all it does in each loop is to change the value of the variable, that has been already initialised. I have no idea why it's so complicated with FairRootManger ( still need some time to figure out what's going on there).
class MainNeulandWRTSDifference : public BetterTask { auto fWrtsMain=FetchData
("MainWrts"); auto fWrtsNL=FetchData ("MainWrts"); auto fDiff=RegisterOutput ("MainNLWrtsDiff"); auto fSomePar=FetchParameter ("WrtsClock"); ... };
I think this way is the best. The only change I would make is to assign the fetched data to a member variable rather than local variable. And I also think it's best to just let task own their own fetched data, instead of registering all the data somewhere else and delete them in the end.
RE neatly byzantine: https://gist.github.com/klenze/34907b55b9bcf1b1401a9af1005a5064 (I would consider this moderately evil. Real evilness would be to rely on return value optimization). Also, I have decided that FairerTask is a nicer name than BetterTask :-)
I understand what Conan package manager can do, but why do you need this for R3BRoot? To handle 4 dependencies?
Concerning the container factories, can you boil down your 3.400 lines of changes to the short description of how do you access parameter containers without factories?
I understand what Conan package manager can do, but why do you need this for R3BRoot? To handle 4 dependencies?
I would say to handle dependencies easier and more flexible. It's very necessary to have a package manager for the project itself rather than waiting for the library updates from upstream (in our case, FairSoft). If we have a package manager locally, using a new library becomes super easy. Just add a line in conanfile.txt and it's done immediately. The other option is wait for FairSoft to add the library, which could take quite a long time.
Concerning the container factories, can you boil down your 3.400 lines of changes to the short description of how do you access parameter containers without factories?
Sure, please look at the function MappedDataReader::SetParContainers
. Here I just use addContainer
from FairRuntimeDb (sorry I actually meant FairRuntimeDb when I said FairRootManager previously):
if (auto* rtdb = FairRuntimeDb::instance(); rtdb != nullptr)
{
calibrationPar_ = std::make_unique<TCalPar>(calibrationParName_.c_str()).release();
calibrationTrigPar_ = std::make_unique<TCalPar>(calibrationTrigParName_.c_str()).release();
rtdb->addContainer(calibrationPar_);
rtdb->addContainer(calibrationTrigPar_);
if (calibrationPar_ == nullptr and calibrationTrigPar_ == nullptr)
{
throw R3B::runtime_error("calibration parameters are still nullptr!");
}
SetExtraPar(rtdb);
}
Here is the function definition from addContainer:
Bool_t FairRuntimeDb::addContainer(FairParSet* container)
{
auto name = container->GetName();
if (!containerList->FindObject(name)) {
containerList->Add(container);
TIter next(runs);
FairRtdbRun* run;
FairParVersion* vers;
while ((run = static_cast<FairRtdbRun*>(next()))) {
if (!run->getParVersion(name)) {
vers = new FairParVersion(name);
run->addParVersion(vers);
}
}
// cout << "-I- RTDB entries in list# " << containerList->GetEntries() <<"\n" ;
return kTRUE;
}
Warning("addContainer(FairParSet*)", "Container %s already exists!", name);
return kFALSE;
}
Here you could see it has nothing to do with containerFactory. All it does is to add a container to the containList.
And what we usually do with a parameter (a container) is to use another function called getContainer()
, which internally checks the name from the container factories, create the container and call addContainer()
. Of course, in this case, the real name of the container is not the string we give to getContainer
but concatenated with another string called "context". I certainly don't think this is completely useless. But in many cases when people just need to create a single instance of a parameter, using a container (aka a parameter) factory is unnecessary.
Re FairerTask, it turns out that C++ disallows auto for member variables. I have no idea why. The best I can do is
Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData=AddInput(fInputName);
Even that is tricky. You can not match template match on return types. However, you are allowed to have a class which defines a templated type conversion operator. Through that operator one can learn the desired return type.
This is of course a hack. I don't think there is a very neat solution: Either you replace the variable declaration by a preprocessor call, or duplicate the type declaration, or abuse type conversion operators. What C++ generally lacks is a high level metalanguage. The preprocessor (which works on strings) is terrible. I think Rust is better in this regard.
Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData=AddInput(fInputName);
Maybe you could default initialise variable first and then AddInput to FairRootManager in Init()
function:
private:
Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData;
auto Init()
{
AddInput(fInputName, fCalifaMappedData);
}
With this the type read from the tree can be deduced by the input parameter type.
Or better way could be:
private:
Ptr<const R3BCalifaCrystalCalData::container_t> fCalifaMappedData = "default_input_string";
auto Init()
{
fCalifaMappedData.Init();
}
@YanzhaoW Sure, if you create parameter containers and add them to the list yourself, then you do not need container factory.
If you decide to handle external software within R3BRoot and to (partially) replace FairSoft with a package manager, you (R3B) will need to maintain it.
This PR will be divided into multiple PRs with smaller sizes.
This PR contains lots of new features. Here is the list:
build.py
to manage all the build processes for R3BRoot. To install the conan packages, use./build.py -p
. To configure cmake, use./build.py -c
. And to build the project, use./build.py -b
. To do all:./build.py -a
. For more specification, check./build.py -h
std::vector<R3B::PaddleTamexMappedData>
and trigger mapped data is stored asstd::map<unsigned int, R3B::PaddleTamexTrigMappedData>
.neuland/calibration/calLevel/
TODO:
It's no hurry to merge this PR to the dev branch. Feel free the leave comments below if there are any suggestions. In the meanwhile, I will try to finish the energy calibration rewriting in few weeks.
Checklist:
dev
branch