HEP-KBFI / hh-multilepton

code and python config files for hh -> 4tau and hh->wwww analyses (all lepton and tau channels)
0 stars 5 forks source link

Do not limit gen photon filter to status = 1 photons #36

Closed ktht closed 3 years ago

ktht commented 3 years ago

At the moment we're using the same GenPhoton collection that consists of prompt status=1 gen photons to estimate the conversion background also in gen photon filter to resolve overlap between eg W+gamma and W+jets samples. There is no reason to use stable photons in the veto because we see that the ME photon in W+gamma (that the current veto is based on) can also decay into a pair of leptons. Thus, we should use a different gen photon collection (GenPhotonAll) that includes unstable photons in order to apply the veto effectively.

Since we're still discussing other options to make the veto more effective, I'm currently just creating the issue.

ktht commented 3 years ago

Just noticed that we don't have GenPhotonAll collection in our Ntuples, probably because I reduced the number of gen collections to save computing time spent on post-production. I'll post-process the W+gamma and W+jets samples shortly to include the missing collection.

edit: need to post-process single top/+gamma, ttbar/+gamma and Z/+gamma samples as well.

ktht commented 3 years ago

Since my last post the situation has developed quite a bit. @kramerto crafted a recipe that makes the merging of X+gamma and X+jets samples more effective (in terms of yields) and accurate (in terms of lepton pT spectrum).

The veto goes as follows: use the event from X+gamma samples if it has a photon passing

The photons considered in the event-level veto are:

In case of X+jets samples, the event-level veto is inverted: if there exists a photon or a photon candidate in the event passing all of the above conditions, the event is vetoed.

Our Ntuples currently contain:

GenPhoton collection is too limited for our purposes, so we need an additional collection of gen photons that does not impose any conditions on the Pythia status code. The problem with GenLep and GenTau collections is that their mother-daughter relationship is lost.

Thus, we need the following collections added to our Ntuples:

We need the three additional collections only in the relevant (X+gamma and X+jets) samples.

Plan of action:

I'll remove my previous development branch from this repository since it has become irrelevant.

ktht commented 3 years ago

Updated my previous post: we also need to pay attention to the mother-daughter relations when finding the SFOS leptons pairs for reconstructing the photon candidates.

ktht commented 3 years ago

Second edit: I forgot that decays such as mu -> mu + e+ + e-, where the intermediate photon may be dropped, can (and are even more likely to) happen.

abrinke1 commented 3 years ago

Updated my previous post: we also need to pay attention to the mother-daughter relations when finding the SFOS leptons pairs for reconstructing the photon candidates.

@ktht So does this mean you will add in parent / daughter info for GEN particles (or at least GEN leptons) into the NTuples that was not there before?

ktht commented 3 years ago

No, not possible. The mother-daughter relationship works by looking up GenPart_genPartIdxMother branch of a daughter particle to find out the position of its mother in the GenPart collection. We drop the GenPart collection during post-processing of the Ntuples because of technical and somewhat historical reasons (reading the full collection is expensive in our analysis FW), so this relationship is completely lost. The post-processing step should be rewritten for Run 3.

So, for this short-term task, the logic of determining good candidates for photon reconstruction has to be partially done in the post-processing stage. It's a bit risky because the price of making a mistake there is higher (due to longer turnaround times spent on post-production) compared to analysis level.

There are some corner cases that need to be addressed that make this task somewhat non-trivial. For instance, there could be a decay like e+ -> e+ + e+ + e-, where any of the three decay products undergoes further generator/showering steps before they become final state leptons. To resolve this, something like this could work:

  1. find all SFOS prompt lepton pairs that have Pythia status = 1;
  2. for every lepton in each pair, find the oldest parent (call it "true parent") that has the same PDG ID as the lepton itself, and it has at most one daughter. These conditions must hold for the "true parent" as well as for its every single descendant. By this definition the "true parent" can be the daughter particle itself, in which case there are no intermediate states of this particle before it becomes final state particle;
  3. by calling the parent of the "true parent" a "grandparent", we keep the lepton pair if either of the following is true: a. the "grandparent" of either lepton does not exist (ie the pair is likely an electron-positron pair coming from an ISR/FSR photon that is dropped during MiniAOD production); b. the "grandparent" of both leptons is the same charged lepton (ie the pair is likely coming from a photon emitted by this charged lepton);
  4. if there are multiple pairs with the same "grandparent", keep the pair that has lower energy;
  5. from every remaining pair reconstruct proxy photons.

At the analysis level, when applying the veto, we read genuine photons, proxy photons and particles that are part of the hard process.

edit: if my memory serves right and the intermediate steps don't conserve 4-momentum for some odd reason, then we should use the "true parents" to reconstruct the proxy photons.

ktht commented 3 years ago

I'm wondering how should we treat the ttbar samples that were produced with different top masses, widths, tune variations, etc in this context. We don't need these samples in HH-multilepton analysis, but they become relevant in analyses such as bbWW DL where the irreducible ttbar background is huge (hence motivating additional NP on ttbar background).

I tend to think that we should apply the same veto for the extra ttbar samples as well because the associated systematic uncertainties are derived wrt the nominal ttbar background that is subject to the veto.

ktht commented 3 years ago

I'll probably move these lines to analysisConfig.py base class: https://github.com/HEP-KBFI/hh-multilepton/blob/42931681d687f5e14166a8ecacb0a6403a5f9385/python/samples/reclassifySamples.py#L107-L119 because not all analyses in this repository or in HH-bbww repository (or any of the ttH analyses) have the gen photon filter implemented. So this feature should be enabled on per channel basis since I have no plans to update another 20+ executables.

ktht commented 3 years ago

The gen photon filter has now been implemented in our analysis FW. I also added a validation workflow that basically reproduced the same plots that @kramerto showed us for W+jets in his studies. The plots are available here: testGenPhotonFilter.pdf

A few comments:

From these plots it's apparent that the pT spectrum of the "extra" lepton in each case is smooth if the gen photon filter is properly applied.

The Ntuple post-processing will finish likely tomorrow (unless there are some jobs that exceed the 48h mark like some 2018 jobs did); reskimming shouldn't take more than a day.