Add PandoraLArRecoNDBranchFiller

jback08 commented 3 weeks ago

Created the PandoraLArRecoNDBranchFiller class to store the reconstruction information from Pandora's LArRecoND package (used for the DUNE ND). It requires a ROOT file created by the HierarchyAnalysisAlgorithm, which uses Pandora's Hierarchy Tools.

The filled reco branches are rec.nd.lar.pandora.tracks and .rec.nd.lar.pandora.showers. No distinction is made (yet) between tracks and showers, and so the same 3D-cluster Particle Flow Objects (PFOs) are used for both. Here is the list of reco variables used:

Start position = cluster vertex point or the first hit position if no vertex is available
End position = cluster end position
Energy = charge Q
Length = primary principal axis length
Quality = number of 3D hits.

Any changes needed here can be done mostly in the LArRecoND package, since the CAF just retrieves the output stored by Pandora's hierarchy algorithm.

The MC truth matching uses Pandora's Hierarchy Tools and not TruthMatcher. Pandora requires all MC particles to have a unique ID (even if they originate from different neutrinos), and so the following offsets are applied to the original (ndlar-flow) MC Ids:

nuId = orig_nuId + 10^8
mcId = orig_mcId + nuIndex * 10^6

where nuIndex = 0 to N-1 for an event with N neutrinos, and the orig_mcId number range restarts from zero for each neutrino. This ensures all IDs are unique, for up to 100 neutrino interactions per event, each containing up to 10^6 hits.

These ID offsets are reversed when the Pandora CAF truth information is filled using the best reco-MC match achieved with Hierarchy Tools, and so the mcId's should then have values consistent with the equivalent input ndlar-flow files. The filled truth information corresponds to:

truth.ixn = orig_nuId
truth.part = orig_mcId (best match)
truth.type = primary, secondary or other
truthOverlap = completeness (best match)

H5 input files are first converted to ROOT using h5_to_root_ndlarflow.py before they are used by LArRecoND & Pandora, which in turn creates the hierarchy analysis output file that can be used to make the equivalent CAFs.

For the trigger, the event run numbers are propagated using those originally from the input ndar_flow ROOT files. Also, the start time is currently set using the ts_start variable, but this needs to be updated (in LArRecoND) to use the correct time and units.

Also updated ndcaf_setup.sh and build.sh to use a consistent environment.

noeroy commented 3 weeks ago

I think that the truth part of it could be an issue, as the truth branches are shared by all reconstructions.

The particles are uniquely identified with two ids, one that identifies the interaction inside a given spill: vertex_id in the flow files, and one that is uniquely identifying for a given interaction which is the Geant4 trajectory id, I believe it is traj_id in the flow files. Those IDs are shared within MLReco and MINERvA so that truth particles are matched to the same id. Would it be possible to have a match between orig_nuIdand vertex_id and between orig_mcId and traj_id?

noeroy commented 3 weeks ago

I wonder if it wouldn't make more sense to only fill the tracks instead of filling twice the same information, especially for analysers that wouldn't know that for now track objects and shower objects here are the same thing. Also considering that it doesn't add any additional information right ?

DUNE / ND_CAFMaker

Add PandoraLArRecoNDBranchFiller #76