Open tomeichlersmith opened 5 months ago
But isnt this a feature of uproot
? like if you did an event loop in the check.py
it should work for the ana-events.root
, no?
No, I originally found this issue with a subsequent ldmx-sw standalone processor and used uproot as a double check. (proof below).
To be very clear. This is only an issue with basic-type branches that are "root" branches and not the sub-branches that are generated when we create a branch with a more complicated parent class (e.g. std::vector<double>
does not have this issue). That is why we haven't observed this before since most data is passed around in packages of these more complicated packages.
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire prod.py
---- LDMXSW: Loading configuration --------
Processor source file /home/tom/code/ldmx/1369-copy-obs-basic-types/Produce.cxx is newer than its compiled library /home/tom/code/ldmx/1369-copy-obs-basic-types/libProduce.so (or library does not exist), recompiling...
done compiling /home/tom/code/ldmx/1369-copy-obs-basic-types/Produce.cxx
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
---- LDMXSW: Event processing complete --------
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire ana.py
---- LDMXSW: Loading configuration --------
Processor source file /home/tom/code/ldmx/1369-copy-obs-basic-types/Ana.cxx is newer than its compiled library /home/tom/code/ldmx/1369-copy-obs-basic-types/libAna.so (or library does not exist), recompiling...
done compiling /home/tom/code/ldmx/1369-copy-obs-basic-types/Ana.cxx
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
81.4724
90.5792
12.6987
91.3376
63.2359
9.75404
27.8498
54.6881
95.7507
96.4889
---- LDMXSW: Event processing complete --------
tom@nixos:~/code/ldmx/1369-copy-obs-basic-types$ denv fire reana.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
-6.38457e+35
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
-2.53744e-11
---- LDMXSW: Event processing complete --------
where prod.py
and ana.py
(and their referenced C++ files) are the same as in the original description. reana.py
is copied below for completely but is just running the same ana processor but over the copied events file.
from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('ana')
p.inputFiles = [ 'ana-events.root' ]
p.sequence = [ ldmxcfg.Analyzer.from_file('Ana.cxx') ]
So, for any readers who didn't participate in writing the Bus
which holds the event objects (i.e. everyone else), the handling of these in-memory objects is rather opaque. With this in mind, I'm going to elucidate a potential source of the issue.
ROOT does not expose a common interface for setting addresses of branches.[^1] I eventually found a solution of using TBranchElement::SetObject
for all "higher level" branches and TBranch::SetAddress
for the lower-level (BSILFD[^2]) branches. This is written into the attach
method of the Bus::Passenger
which uses argument-type-deduction to delegate to attachBasic
for BSILFD types and attach
for everything else. You'll notice that this is the exact separation we are observing. Once we wrap a float
(for example) in a more complicated class the issue goes away.
This makes me want to investigate the attachBasic
implementation to see if we need to modify it.
Perhaps adding a branch->SetAutoDelete(false)
like
which is what is done in attach
.
[^1]: Caveat: I could revisit this and look at using TBranch::SetAddress
everywhere. At the time, I could not get that to work and so here we are.
[^2]: Bool Short Int Long Float Double
Describe the bug When writing an output file while reading an input file, the branches that are basic-type and being observed by the process are not copied into the output file properly. This is difficult to explain, so it is best to understand by looking through the example in To Reproduce.
To Reproduce I originally observed this with some actual simulation data I'm working with, but I condensed it into a smaller reproducible example here. I reference a few C++ source and python files that I've copied here since they are small. The TLDR on those files is that I produce two branches
data
andunobs
which are both simplefloat
branches. The "analysis" only looks at thedata
branch while creating a new file where both branches are copied into it. The resulting events file has adata
branch that only has corrupted-looking data while theunobs
branch has data matching what was originally produced.Files for Reproducing
### Produce.cxx ```cpp #include "Framework/EventProcessor.h" #includedenv init ldmx/pro:v4.0.1
denv fire prod.py
denv fire ana.py
check.py
Desired behavior I would really like for the Framework to be able to copy all types of branches whether or not they have been observed. While it sounds funny, being observed does trigger a different method of copying within the Framework since - if no processor needs a branch - we optimize the copy by letting ROOT copy the buffers directly instead of trying to load them into memory. This is not an issue with how the simple data types are being read as can be seen by the printouts within the "analysis" processor.
Environment: Output of
denv config
: