Output format update/discussion

mvicenzi commented 4 months ago

The default output format defined in AnalysisManager is getting inadequate given that the simulation no longer contains just FLArE. For example, no information is saved from other sensitive volumes by default. This is in contrast with our push to make this toolkit available/usable by the entire FPF community.

We should revisit our plans for the output file format to be more inclusive of all detectors.
We should revisit our use of sensitive detectors: how are we planning to implement digitization? I would push to have it in an indipendent/downstream package.
We should revisit our "reconstruction" code/variables. My suggestion would be to save the recorded G4hits in the output (maybe divided according to SD?) and move every reco script outside of Geant4.

WenjieWu-Sci commented 3 months ago

I completely agree. It has been in my mind for a while, but I didn't really think it through in terms of how to implement it, there are many alternative detector configurations. It requires much flexibility to the output variables. I thought about saving the G4hits, but there are so many of them given the high energies and it results in a very large file size. I did some test before, which seems impossible to save all the hits. But maybe we can merge hits geometrically nearby?

mvicenzi commented 3 months ago

Yes, it's not easy to create a flexible system given all the configurations and their possible output variables. My feeling is that it's better to keep things simple by outputting low-level info (G4hits), so that all the hit merging and hit digitization steps can be performed afterwards by detector-specific tools. However, it's true that it may not be possible to save everything... From your tests, which detector is more challeging? For example, the FLArE output could end up just being the pixelated projections (time and charge for each pixel) instead of the full 3D set of hits.

Maybe we can start by adding some infrastructure and options to save hits only from specific sensitive volumes? It might end-up being impossible to dump all of them, but if someone is interested in a specific subdetector they can do that more easily?

I was looking at what the edep-sim output format looks like from here. For each event, there are three objects:

Primaries: The GEANT4 primary particles (A vector of TG4PrimaryVertex)

Trajectories: The GEANT4 particle trajectories (A vector of TG4Trajectory)

SegmentDetectors: The energy deposition information (A map keyed by sensitive detector name, containing a vector of TG4HitSegments).

We already have in place a vector of primaries, although I'm not sure how it would handle multiple vertexes. The hits are saved as a map using the sensitive detectors as keys, so we could potentially replicate something similar (specifying which ones to save in a macro parameter?). Regarding trajectories, they're most likley not needed unless we want to do some fancy visualization.

WenjieWu-Sci / FLArE

Output format update/discussion #48