Better definition for the "fraction" attribute of "mc_packets_assn"

drinkingkazu commented 1 year ago

The current version of the code a071fd4 computes the fraction by the following code piece in the fee.py L428-430:

if tot_backtracked != 0:
    for itrk in range(current_fractions.shape[2]):
        current_fractions[ip][iadc][itrk] /= tot_backtracked

As such, the "fraction" is normalized within the recorded set of segments. The number of segments to be recorded, N, is fixed while the code runs disregard of the number of segments that actually contribute to the subject packet. For instance, the current default is N=5.

The actual number of segments contributing to the subject packet (say M in this example) can exceed N. Consider a specific case where the packet carries value 10 contributed from M=6 segments with segment index numbers 1 to 6. The configuration sets N=5. Say the contributions in the respective order are 0.01, 0.02, 0.03, 0.04, 0.05, 0.85. In this case, the current code records the following values.

"track_ids": [1,2,3,4,5]
"fraction": [0.067, 0.133,  0.200, 0.267, 0.333]

Intuitively, the "fraction" is useful to infer how much fraction of packet value (current, charge) each segment contributed. However, because the fraction is normalized within the recorded set of the segments, this becomes impossible when M>=N. Note the case where M=N is included in this situation since we cannot tell whether M=N or M>N from the given set of information.

Specifically, this leads to three issues:

Issue A: We cannot detect the fact that there is unaccounted segments contributing to the packet
Issue B: We cannot quantify the unaccounted contributions, nor the actual contribution values from the recorded segments.
Issue C: We could miss more relevant segments with larger contribution value (segment 6 in the above example). This is because the N segments are selected by the order of those segments in the input data array.

I propose two changes.

Compute and store the fraction with respect to the total contributing charge (i.e. do not normalize only within the recorded segments). This addresses the issue A and B.
Select the N segments to be recorded by the fraction value (calculated according to 1) in the descending order. This addresses the issue C.

Change 1 is to simply modify the noted lines of the code with the following:

if q_sum > 0:
    for itrk in range(current_fractions.shape[2]):
        current_fractions[ip][iadc][itrk] /= q_sum

(thanks to @soleti )

Change 2 is to add array sorting in the fee.py. That looks like below (only a part of the code here):

frac_order = np.flip(np.argsort(np.abs(packets_frac),axis=1),axis=1)
ass_track_ids = np.take_along_axis(packets_mc,   frac_order, axis=1)
ass_fractions = np.take_along_axis(packets_frac, frac_order, axis=1)

The complete change can be seen here including some case handling.

drinkingkazu commented 1 year ago

PR 138 addresses these.

soleti commented 1 year ago

I agree with this change, it was indeed a behavior that bugged me and I can't remember why I didn't adopt the solution above from the beginning.

YifanC commented 1 year ago

Hi @soleti @drinkingkazu, I definitely resonate the concept of the changes! One thing that I am troubling with since a week is that by doing current_fractions[ip][iadc][itrk] /= q_sum, even there's one single segment contributing to a pixel readout, the fraction may not be 1. https://github.com/DUNE/larnd-sim/blob/d78d36470eb94b74a640dea6b576c4b544a2584d/larndsim/fee.py#L455

soleti commented 1 year ago

Hi @soleti @drinkingkazu, I definitely resonate the concept of the changes! One thing that I am troubling with since a week is that by doing current_fractions[ip][iadc][itrk] /= q_sum, even there's one single segment contributing to a pixel readout, the fraction may not be 1.

https://github.com/DUNE/larnd-sim/blob/d78d36470eb94b74a640dea6b576c4b544a2584d/larndsim/fee.py#L455

Hi @YifanC, I am a bit rusty with this code, can you explain a bit when/how that happens (a single segment and fraction < 1)?

YifanC commented 1 year ago

Hi @soleti, this is based on the observation of testing this PR. Actually the fraction is not necessarily below 1, could be also above 1. The offset from single segment backtracking for example can be offset by a lot. In the test sample, I have seen ~30% fluctuation. After discussing with @krwood, we think the issue is caused by the noise. If all noise is 0, then the fraction exports what we expect. The main drive is the RESET_NOISE_CHARGE. It is straightforward to bypass that by setting a "true_q" to reassemble "q_sum" without noise (reset noise). Specifically, setting "true_q" to 0 here: https://github.com/DUNE/larnd-sim/blob/d78d36470eb94b74a640dea6b576c4b544a2584d/larndsim/fee.py#L380 and https://github.com/DUNE/larnd-sim/blob/d78d36470eb94b74a640dea6b576c4b544a2584d/larndsim/fee.py#L469. However, DISCRIMINATOR_NOISE and UNCORRELATED_NOISE_CHARGE still leak to the fraction definition through: https://github.com/DUNE/larnd-sim/blob/d78d36470eb94b74a640dea6b576c4b544a2584d/larndsim/fee.py#L411 (among the two, DISCRIMINATOR_NOISE yields larger impact) It's not obvious to me how to have "true_q" uncontaminated by DISCRIMINATOR_NOISE and UNCORRELATED_NOISE_CHARGE without redo this part (https://github.com/DUNE/larnd-sim/blob/d78d36470eb94b74a640dea6b576c4b544a2584d/larndsim/fee.py#L411-L472). I feel the repetitive loops will make it unnecessarily slow.

YifanC commented 1 year ago

Hi @soleti, sorry for pouring text here. Just want to update two things. In short, I think we have a solution.

After talking to @krwood, I realised that it's a bad idea to create a parallel universe for this (https://github.com/DUNE/larnd-sim/blob/21d580fbd604414035ecd478067a485e20b4cdfe/larndsim/fee.py#L413) is a bad idea. As it will cause potential issues with mismatching "real" packets and "backtracking" packets.
As it turned out, it seems resetting the noise in "hit-veto" logic could solve this issue. Although I still think the noise in discriminator still matters a little as when to start the integration, probably the noise resetting gives larger impact. https://github.com/DUNE/larnd-sim/blob/21d580fbd604414035ecd478067a485e20b4cdfe/larndsim/fee.py#L445

I will submit a PR of feature_t0_vertex branch with the proposed change to develop. Anyone who's interested could take a look. Thank you!

soleti commented 1 year ago

Hi @YifanC, thank you for looking into this. Now that you say it, I remembered why I implemented the fraction the way it is right now: exactly because it was complicated to implement "true" fraction, given the noise. I think your solution should work, thanks for the PR.

YifanC commented 1 year ago

Thank you for checking! Appreciated

DUNE / larnd-sim

Better definition for the "fraction" attribute of "mc_packets_assn" #137