eic / EICrecon

EIC Reconstruction - JANA based
https://eic.github.io/EICrecon
GNU Lesser General Public License v3.0
6 stars 27 forks source link

Create Multiple Primary MC-Cluster Associations #1396

Open ruse-traveler opened 4 months ago

ruse-traveler commented 4 months ago

Briefly, what does this PR introduce?

This PR adjusts the logic used in establishing the truth-cluster associations in CalorimeterClusterRecoCoG. Previously, the truth in the truth-cluster associations was defined to be the MCParticle of the 1st contribution stored in the list of contributions to the SimCalorimeterHit corresponding to the highest energy hit in the cluster. This PR changes the logic such that now

  1. Multiple MC-Cluster associations are created,
  2. Associations link back to primary (generator status == 1) MC particles, and
  3. Each association carries a weight of E_{contributed} / E_{cluster}.

This PR addresses #1475 and partially addresses #898. The generator status == 1 condition should be revisited in a subsequent PR: this definition of primary may miss cases, particularly for longer lived neutral resonances such as the K^{0}_{L}, which would be useful information to retain.

What kind of change does this PR introduce?a

Please check if this PR fulfills the following:

Does this PR introduce breaking changes? What changes might users need to make to their code?

While this PR doesn't introduce breaking changes, it does introduce substantial changes that users might need to account for in downstream analysis code (see below).

Does this PR change default behavior?

Yes: MC-Cluster associations (edm4eic::MCRecoClusterParticleAssociations) will now contain multiple associations per clusters, and will link back to primary (status == 1) MC particles.

ruse-traveler commented 4 months ago

@wdconinc I remember you pointing out that this could be an issue for systems with large numbers of contributions (specifically the LFHCal) in the chat today. I'll check to make sure this doesn't cause a drastic slowdown...

github-actions[bot] commented 4 months ago

Capybara summary for PR 1396

veprbl commented 4 months ago

I don't think this is correct. If you have a low-energy electron and a high-energy pion in ECal, your pion can leave a MIP response, while electron will actually contribute most of the energy in the cluster. The code tries to answer the question: which particle is measured by the cluster?

ruse-traveler commented 4 months ago

I don't think this is correct. If you have a low-energy electron and a high-energy pion in ECal, your pion can leave a MIP response, while electron will actually contribute most of the energy in the cluster. The code tries to answer the question: which particle is measured by the cluster?

Ah! I see what you're saying. I was thinking about this in a very "HCal" sort of way... So maybe rather than selecting the highest energy particle, I select the one associated with the largest contribution? That would dovetail nicely with what Wouter suggested above!

veprbl commented 4 months ago

I think the current implication is that the highest energy contribution is at the first index. We should, of course, check that. My personal suspicion is that we might need to calculate total primary particles contributions including their downstream contributions in the shower, and then we pick the maximal one. Basically, I think we need to traverse the tree to the nodes with status=1.

veprbl commented 4 months ago

And yes, it kind of made sense for HCals, I agree.

ruse-traveler commented 4 months ago

I think the current implication is that the highest energy contribution is at the first index. We should, of course, check that. My personal suspicion is that we might need to calculate total primary particles contributions including their downstream contributions in the shower, and then we pick the maximal one. Basically, I think we need to traverse the tree to the nodes with status=1.

That lines up with what I've been seeing: I don't think I've ever seen a 2nd particle be the highest energy (granted this is for single particle events and we're only considering the highest energy hit in a cluster). But I agree: it would be good to confirm this...

On the status=1, I'm curious: what about situations where a shower particle from an ECal (or possibly from a support structure or the beampipe) creates a cluster in an HCal? My concern would be that the HCal isn't necessarily measuring the primary in this case...

veprbl commented 4 months ago

According to capybara, this doesn't do much for any calorimeter. Maybe, B0 is slightly modified, but we don't even know why.

On the status=1, I'm curious: what about situations where a shower particle from an ECal (or possibly from a support structure or the beampipe) creates a cluster in an HCal? My concern would be that the HCal isn't necessarily measuring the primary in this case...

It depends on what the purpose is. We don't build HCals to look good on beam tests, we need to do physics with those. Matching with primaries is a way to study how well we are doing that. Allegedly we don't store MCParticles outside of the tracking volume, so what would it even associate with.

ruse-traveler commented 4 months ago

Allegedly we don't store MCParticles outside of the tracking volume, so what would it even associate with.

AH! If that's the case then agreed: we should just stick with primaries.

ruse-traveler commented 4 months ago

To see if these changes introduce a dramatic reduction in performance, I tried out these changes on 1000 18x275, Q^2>100 NC DIS events and compared against main run on the same events. The average throughput doesn't seem to be too different (0.201 for this PR vs. 0.196 for main)... main_nEvt1K.log pr1396_withChanges_nEvt1K.log

[BTW I ran this on SDCC]

veprbl commented 4 months ago

This is interesting. Looking at capybara reports, I can't quite see a big change for HCals. Has anyone looked at changes for $E/p$ before and after this?

ruse-traveler commented 4 months ago

This is interesting. Looking at capybara reports, I can't quite see a big change for HCals. Has anyone looked at changes for E/p before and after this?

I did not! But I can quickly check that...

ruse-traveler commented 4 months ago

Here is the E/p (cluster energy / association momentum) for several of the calorimeters in the central detector. In general, there's not much of a difference between main and this PR...

PECal_clustParEP mainVsPR1396 py8s18x275minq100ang0025div1NEvt1K d7m5y2024 PHCal_clustParEP mainVsPR1396 py8s18x275minq100ang0025div1NEvt1K d7m5y2024 BECal_clustParEP mainVsPR1396 py8s18x275minq100ang0025div1NEvt1K d7m5y2024 BHCal_clustParEP mainVsPR1396 py8s18x275minq100ang0025div1NEvt1K d7m5y2024 NECal_clustParEP mainVsPR1396 py8s18x275minq100ang0025div1NEvt1K d7m5y2024 NHCal_clustParEP mainVsPR1396 py8s18x275minq100ang0025div1NEvt1K d7m5y2024

veprbl commented 4 months ago

So, if this doesn't do anything, we don't need to try to rush this into the release?

ruse-traveler commented 4 months ago

So, if this doesn't do anything, we don't need to try to rush this into the release?

I'd say so! And I think I'd appreciate a little more time to let this stew...

ruse-traveler commented 3 months ago

Putting this back in draft while I implement changes...

ruse-traveler commented 1 week ago

Quality Gate Failed Quality Gate failed

Failed conditions 15.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

Hmmmm... It looks like all of the duplication warnings are for the plugins where all of the changes are just adding preprocessor commands and swapping out collections where needed 😕

wdconinc commented 1 week ago

It looks like all of the duplication warnings are for the plugins

Because we treat our code as something to store data :-)

ruse-traveler commented 1 week ago

Because we treat our code as something to store data :-)

I'm not sure I follow 😅 Like metadata?

ruse-traveler commented 1 week ago

All right, I think this PR is in a state ready for another review!

If folks are okay with it, I've opted to hold off further investigation into the primary definition until next month. We can follow up in a subsequent PR if we decide that the definition here (status = 1) is insufficient.

sonarcloud[bot] commented 4 days ago

Quality Gate Failed Quality Gate failed

Failed conditions
15.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud