eic / EICrecon

EIC Reconstruction - JANA based
https://eic.github.io/EICrecon
GNU Lesser General Public License v3.0
6 stars 29 forks source link

JANA based event merger #1592

Open simonge opened 2 months ago

simonge commented 2 months ago

In order to properly understand the physics performance of the ePIC detector we need to include hits from background sources in our reconstruction workflow. While the HEPMC_Merger exists it is far from an ideal solution to require mixed events to be passed through the simulation. Instead, merging events between the simulation and digitization stage, allows the same simulated events to be used multiple times in different studies such as investigating how reconstruction is effected by luminosity, amongst many others.

Requirements of the merger

Main issues with any approach

This feels like it should be possible with the event folder, folding lots of physics events into a timeframe but from what I've seen that isn't quite right.

Potential approach (full of pitfalls)

veprbl commented 2 months ago

This feels like it should be possible with the event folder, folding lots of physics events into a timeframe but from what I've seen that isn't quite right.

Can you expand on that?

In a new tree/frame create new renamed collections for each source which contains a reference to the simulation hit, time offset and event number

There is no need to "rename" collections. It's just that we need to add functionality for PODIO Source to have modified tags, that are not simply original names of the collections in the frames.

simonge commented 2 months ago

Alternative more end user friendly approach is probably copy everything and do manual book keeping of the associations so it can all be kept in the same file

veprbl commented 2 months ago

DDG4 doesn't produce any associations. We only need to copy hits from different collections to a single one, plus modify their timestamps.

simonge commented 2 months ago

DDG4 doesn't produce any associations. We only need to copy hits from different collections to a single one, plus modify their timestamps.

Sorry, I meant the relations rather than associations, OneToOneRelations from SimTrackerHits/CaloHitContribution to MCParticles and the OneToManyRelations between SimCalorimeterHit and CaloHitContribution.

simonge commented 2 months ago

This feels like it should be possible with the event folder, folding lots of physics events into a timeframe but from what I've seen that isn't quite right.

Can you expand on that?

Despite the many presentations on the concept from @nathanwbrei I haven't been able to conceptually follow what happens to the data when moving between timeframe/event/sub-event. All of the examples start with a higher level and break it into smaller ones, which means they can be processed as a conventional event read and just have tags saying what event/subevent it belongs to within a larger collection. I would be more than happy for my understanding to be wrong and have it rewritten.

veprbl commented 2 months ago

(Replying to the last comment)

I might be wrong, but the division between timeframe, event, sub-event, while it is currently hardcoded (I believe, some constants are defined), is not implying an actual hierarchy of number of JEvents being passed. What I mean, is you could "fold" from timeframe to event and produce more events than timeframe, or "unfold" from timeframe to event and produce less events than timeframes. You can also do anything in between. And, I remember, @nathanwbrei mentioned that the level could, in principle be created by the users with arbitrary names (we could imaging "signal", "bg1", "bg2" instead), but for now we could (mis-)appropriate the existing defined levels for our needs.

simonge commented 2 months ago

Folding/unfolding actually changing the number of JEvents rather than tagging subsets of the event was my original impression. Keeping track of relations/associations between levels still sounds tricky.

My understanding is that a object in a podio collection can only ever belong to one collection, if you want to include it in another collection you need to either use a subset collection or copy it. Creating a subset collection which points to a different/several different JEvents wouldn't work (as far as I can see at least when writing out the objectID wouldn't work) while copying it will break any associations to it.

Here you'd want to have the object owned/shared by collections in JEvents on different levels. That way when saving to an output file with many frames/trees representing different levels both can contain the same objects with their relations in tact.

This might of course be what's happening in which case, fantastic, I'm slowly catching up.

veprbl commented 2 months ago

I don't think we can do without rewriting collection id's and indices. Objects will have to be copied in any case, as we need to modify time fields in most of them.

simonge commented 2 months ago

The alternative would be having an event data type with a time offset (along with some tag which says what the event source was) and adding a bunch of association collections between the hits and particles to that.

Will something like that not be needed to trace back through the event levels anyway?

simonge commented 2 months ago

Current thoughts on a process which as far as I know would work but needs access to some additional JANA features:

For each event source j

  1. Create a new edm4eic::EventSource with a time offset and identifier, this should either be a one to many or one to one relation with the MCParticles.
  2. Copy the MCParticles, offsetting the times by the timeline time and creating an association with the original MCParticle.
  3. Loop over the associations updating the parent/daughter fields of the new MCParticles.
  4. Copy the SimTrackerHits, offsetting the time and updating the relation based on the association.
  5. Copy the CaloHitContribution, offsetting the time and updating the relation based on the association. Create an association to the original CaloHitContribution.
  6. Copy the SimCalorimeterHits, updating their relation based on the contribution associations.

Bringing the event sources together is a bit simpler and could probably be handled in a number of different ways depending on whether it is attached to the digitization/reconstruction/stand alone. Probably easiest would be a stand alone merger (or at least separate JANA Plugin if that's how they're meant to work) where each source has 4 collections with a source tag then original named collections are just a subset collection with those merged.

Merging of the metadata/checking for conflicts is another kettle of fish.

simonge commented 3 weeks ago

@nathanwbrei would you be able to comment on this and direct me on how to get started.

nathanwbrei commented 3 weeks ago

@simonge Let's schedule a call!