AIDASoft / podio

PODIO
GNU General Public License v3.0
23 stars 57 forks source link

[WIP] Add a c++ implementation for `podio-dump` #620

Open tmadlener opened 2 months ago

tmadlener commented 2 months ago

BEGINRELEASENOTES

ENDRELEASENOTES

This is an attempt at making podio-dump quicker after several complaints (e.g. https://github.com/key4hep/EDM4hep/issues/312). After some "profiling" it turns out that the slowest part in the python implementation is the loop over all the collections which can be significantly sped up by going to c++. In my local timings the current (python based) podio-dump is almost ten times slower than this (c++ based) podio-dump-cpp) for dumping the example_frame.root file from the tests (times via time)

podio-dump podio-dump-cpp
real 12.393s 1.513s
user 8.522s 1.251s
sys 3.823s 0.296s

The main disadvantages of the c++ implementation are that we need quite a bit of boilerplate for things that are trivial in python, e.g.:

Since dumping the datamodel would require quite a bit of work in c++, I would be in favor of keeping that in python in a separate tool, while the other functionality could be covered by the c++ implementation.

TODO:

Zehvogel commented 2 months ago

I wonder how an RDataFrame-based python version (with pre-compiled functions) would fare on a performance vs. comfort scale