DUNE / ND_CAFMaker

Code for making CAFs from ND inputs (whether 2x2+MINERvA prototype or full ND simulation)
Apache License 2.0
1 stars 11 forks source link

Missing reco--truth matches for both particles and interactions #70

Closed cuddandr closed 1 month ago

cuddandr commented 2 months ago

There are cases of zero true particle matches for a given reconstructed particle when parsing CAFs. In these cases the sr->common.ixn.dlp[ixn].part.dlp[ipart].truth vector is empty (also truthOverlap is empty). The reconstructed particles have (apparently) valid values for their momentum/energy/etc.

Similarly, the reco--true interaction matching also has instances where reconstructed interactions will have no corresponding truth matches. The sr->common.ixn.dlp[ixn].truth and truthOverlap vectors are empty.

This is present in both MiniRun5 beta1 and beta2a files, and a couple of examples can be found below with the files and spill numbers identified.

caf_bug_output_20240626.txt

YifanC commented 2 months ago

Hey @cuddandr , how often does this happen? Also do you mind providing some visual examples? Francois reckons it's due to noise but it seems suspicious to me.

cuddandr commented 2 months ago

Running over the available MR5 Beta2a CAFs, I get the following numbers:

No truth interaction match: 17473 / 261835 ~ 6.67% of reco interactions have no truth match. No truth particle match: 62772 / 742172 ~ 8.46% of reco particles have no truth match.

Then for the MR5 Beta1 CAFs, it's even worse: No truth interaction match: 134926 / 412304 ~ 32.7% of reco interactions have no truth match. No truth particle match: 373233 / 1005966 ~ 37.1% of reco particles have no truth match.

cuddandr commented 1 month ago

I have new information (but no event display, still a work in progress). Looking at some beta1 flow files, it turns out a number of events/spills have no hits (or ADC counts) and no corresponding MC truth information (or even some reconstructed information).

Looking at the charge/raw_events/data or charge/events/data datasets and trying to deference them with the h5flow deference function returns entirely masked arrays, i.e. no valid data, when trying to find the corresponding truth interaction or even prompt hits. Looking at the charge/events/data dataset also reports zero hits and zero ADC counts for the given spill/event.

So it looks like in the CAFs for reconstructed interactions that have no truth interaction, the cause is missing info at the nd-flow (or larnd-sim) stage. (Or at least this explains some of the missing reco--truth matches).

YifanC commented 1 month ago

Can you provide a file and event number?

cuddandr commented 1 month ago

Yes. Here are a few dozen examples. Reading files from the 2x2 productions stored at NERSC:

/global/cfs/projectdirs/dune/www/data/2x2/simulation/productions/MiniRun5_1E19_RHC/MiniRun5_1E19_RHC.flow.beta1/FLOW/0000000/

File names and the id field for charge/event/data:

File: MiniRun5_1E19_RHC.flow.0000010.FLOW.hdf5
IDs : 27,37,67,77,86,108,126,128,150,182,185

File: MiniRun5_1E19_RHC.flow.0000013.FLOW.hdf5
IDs : 7,8,15,57,71,78,137,148,160,163,176,184

File: MiniRun5_1E19_RHC.flow.0000017.FLOW.hdf5
IDs : 6,16,19,27,33,50,79,84,88,91,92,96,105,107,114,134,143,147,150,156,191

File: MiniRun5_1E19_RHC.flow.0000020.FLOW.hdf5
IDs : 19,24,43,45,46,50,51,52,65,66,77,90,104,118,123,131,144,145,151,153,156,161,168,184
sindhu-ku commented 1 month ago

Alright, I am closing this issue as it is not CAF related. The interaction level matching issue is at the flow level as Andrew pointed out and also the below reason. Those reco "particles" are actually just noise coming from simulation, not actual particles (after talking to Francois and checking the event displays). This also leads to reco interactions that don't have a truth match. Entry 4 in the first file (the points in the middle): image I don't even know why these hits are associated to one of the two interaction vertices in flow (needs more detailed look inside the flow file, I'll leave it to the experts)

YifanC commented 3 weeks ago

Hi @cuddandr @sindhu-ku @francois-drielsma Sorry for circle back to this after a while. I looked at a few events that Andrew pointed out. They have no hits and their corresponding true segments are neutrons, photons, nucleus and very short electrons. In these cases, it makes sense to me to not have a truth matching. However, haven't looked at the CAF files myslef, I wonder how long are these particles, how many particles do these interactions have and how much energy do they deposit? I'm not convinced that there are so many reconstructed particles that are from noises in the simulation. Have we checked these hits really not having backtracked segments? How they are associated to an interaction is a downstream question.

sindhu-ku commented 3 weeks ago

From what I checked, they have backtracked segments, but those correspond to those other vertices. But it's been a while, I don't remember. This is entry 4 in the first file if you want to take a look.

YifanC commented 2 weeks ago

Moving some of the slack conversation to conclude this is behaviour is expected and with the later version the number of not matched particles is low.

FD: I looked at the first flow2supera LArCV file from MR5 beta2a that Yifan provided me with and found 6 particles out of 643 (~1%) which do not have a truth match (this is using SPINE, the only package that will be maintained moving forward, which had several bug fixes from the legacy lartpc_mlreco3d repo). I looked at all 6 particles and here's what they look like (entries [ 27, 79, 117, 133, 171, 174] in that file)

YC: Given the above checks I think it's fair to say the observed behaviour is tolerable. The changes in these numbers came from the updates in the heuristic cuts in supera.