Do not limit gen photon filter to status = 1 photons

ktht commented 3 years ago

At the moment we're using the same GenPhoton collection that consists of prompt status=1 gen photons to estimate the conversion background also in gen photon filter to resolve overlap between eg W+gamma and W+jets samples. There is no reason to use stable photons in the veto because we see that the ME photon in W+gamma (that the current veto is based on) can also decay into a pair of leptons. Thus, we should use a different gen photon collection (GenPhotonAll) that includes unstable photons in order to apply the veto effectively.

Since we're still discussing other options to make the veto more effective, I'm currently just creating the issue.

ktht commented 3 years ago

Just noticed that we don't have GenPhotonAll collection in our Ntuples, probably because I reduced the number of gen collections to save computing time spent on post-production. I'll post-process the W+gamma and W+jets samples shortly to include the missing collection.

edit: need to post-process single top/+gamma, ttbar/+gamma and Z/+gamma samples as well.

ktht commented 3 years ago

Since my last post the situation has developed quite a bit. @kramerto crafted a recipe that makes the merging of X+gamma and X+jets samples more effective (in terms of yields) and accurate (in terms of lepton pT spectrum).

The veto goes as follows: use the event from X+gamma samples if it has a photon passing

pT > 20 GeV;
|eta| < 2.5;
isolation cut of deltaR = 0.06 wrt isHardProcess electrons, muons, taus, quarks (except for tops) and gluons, with no requirements on their Pythia status (except for quarks and gluons that should not have Pythia status equal to 21 in the veto): |pdgId| = 11, 13, 15, 21 or < 6).

The photons considered in the event-level veto are:

isPrompt photons with no requirements on their Pythia status;
photon candidates constructed from isPrompt SFOS lepton pairs, where
- the electrons and muons are required to have Pythia status of 1 and the tau leptons (|pdgId| = 15) have Pythia status of 2;
- the electrons, muons and tau leptons are either parentless or originate from the same lepton parent (for example, the electron-positron pair in mu -> mu + e+ + e-);
- and if there are multiple such pairs in the event, pick the combination that has the lowest energy as the photon candidate.

In case of X+jets samples, the event-level veto is inverted: if there exists a photon or a photon candidate in the event passing all of the above conditions, the event is vetoed.

Our Ntuples currently contain:

GenPhoton: isPrompt photons with Pythia status = 1 (because we use them in the estimation of conversion background);
GenLep: isPrompt or isDirectPromptTauDecayProduct electrons and muons with additional isLastCopy condition and Pythia status = 1 (because we use them to match to reconstructed leptons);
GenTau: all generator-level tau leptons, regardless of their Pythia status codes or flags.

GenPhoton collection is too limited for our purposes, so we need an additional collection of gen photons that does not impose any conditions on the Pythia status code. The problem with GenLep and GenTau collections is that their mother-daughter relationship is lost.

Thus, we need the following collections added to our Ntuples:

isPrompt electrons, muons (with Pythia status = 1) or tau leptons (with Pythia status = 2) that are either parentless or come from the same lepton parent (GenPhotonCandidates);
isPrompt photons with no requirement on the Pythia status ~(GenPromptPhotons)~ (edit: we'll use the same GenPhoton collection but filter out status = 1 photons for gen matching and histogram filling at the analysis level);
isHardProcess particles with |pdgId| = 11, 13, 15, 21 or < 6, and with no requirements on their Pythia status (~GenFromHardProcess~ GenIsHardProcess).

We need the three additional collections only in the relevant (X+gamma and X+jets) samples.

Plan of action:

implement the three additional collections in our post-processing modules (2-4h with testing);
- it's worth to check if we can update the current eg GenPhoton collection with a more broader one and make the Pythia status condition more explicit when estimating conversions (edit: we're going with it).
rerun post-processing and skimming of the relevant samples (1-2 days);
implement the veto in ttH, hh-multilepton and hh-bbww repositories in a development branch (< 1 day).

I'll remove my previous development branch from this repository since it has become irrelevant.

ktht commented 3 years ago

Updated my previous post: we also need to pay attention to the mother-daughter relations when finding the SFOS leptons pairs for reconstructing the photon candidates.

ktht commented 3 years ago

Second edit: I forgot that decays such as mu -> mu + e+ + e-, where the intermediate photon may be dropped, can (and are even more likely to) happen.

abrinke1 commented 3 years ago

Updated my previous post: we also need to pay attention to the mother-daughter relations when finding the SFOS leptons pairs for reconstructing the photon candidates.

@ktht So does this mean you will add in parent / daughter info for GEN particles (or at least GEN leptons) into the NTuples that was not there before?

ktht commented 3 years ago

No, not possible. The mother-daughter relationship works by looking up GenPart_genPartIdxMother branch of a daughter particle to find out the position of its mother in the GenPart collection. We drop the GenPart collection during post-processing of the Ntuples because of technical and somewhat historical reasons (reading the full collection is expensive in our analysis FW), so this relationship is completely lost. The post-processing step should be rewritten for Run 3.

So, for this short-term task, the logic of determining good candidates for photon reconstruction has to be partially done in the post-processing stage. It's a bit risky because the price of making a mistake there is higher (due to longer turnaround times spent on post-production) compared to analysis level.

There are some corner cases that need to be addressed that make this task somewhat non-trivial. For instance, there could be a decay like e+ -> e+ + e+ + e-, where any of the three decay products undergoes further generator/showering steps before they become final state leptons. To resolve this, something like this could work:

find all SFOS prompt lepton pairs that have Pythia status = 1;
for every lepton in each pair, find the oldest parent (call it "true parent") that has the same PDG ID as the lepton itself, and it has at most one daughter. These conditions must hold for the "true parent" as well as for its every single descendant. By this definition the "true parent" can be the daughter particle itself, in which case there are no intermediate states of this particle before it becomes final state particle;
by calling the parent of the "true parent" a "grandparent", we keep the lepton pair if either of the following is true: a. the "grandparent" of either lepton does not exist (ie the pair is likely an electron-positron pair coming from an ISR/FSR photon that is dropped during MiniAOD production); b. the "grandparent" of both leptons is the same charged lepton (ie the pair is likely coming from a photon emitted by this charged lepton);
if there are multiple pairs with the same "grandparent", keep the pair that has lower energy;
from every remaining pair reconstruct proxy photons.

At the analysis level, when applying the veto, we read genuine photons, proxy photons and particles that are part of the hard process.

edit: if my memory serves right and the intermediate steps don't conserve 4-momentum for some odd reason, then we should use the "true parents" to reconstruct the proxy photons.

ktht commented 3 years ago

I'm wondering how should we treat the ttbar samples that were produced with different top masses, widths, tune variations, etc in this context. We don't need these samples in HH-multilepton analysis, but they become relevant in analyses such as bbWW DL where the irreducible ttbar background is huge (hence motivating additional NP on ttbar background).

I tend to think that we should apply the same veto for the extra ttbar samples as well because the associated systematic uncertainties are derived wrt the nominal ttbar background that is subject to the veto.

ktht commented 3 years ago

I'll probably move these lines to analysisConfig.py base class: https://github.com/HEP-KBFI/hh-multilepton/blob/42931681d687f5e14166a8ecacb0a6403a5f9385/python/samples/reclassifySamples.py#L107-L119 because not all analyses in this repository or in HH-bbww repository (or any of the ttH analyses) have the gen photon filter implemented. So this feature should be enabled on per channel basis since I have no plans to update another 20+ executables.

ktht commented 3 years ago

The gen photon filter has now been implemented in our analysis FW. I also added a validation workflow that basically reproduced the same plots that @kramerto showed us for W+jets in his studies. The plots are available here: testGenPhotonFilter.pdf

A few comments:

the plots are based on all available 2016 MC statistics with no event selection whatsoever (other than the minimum lepton multiplicity requirement to draw the pT distributions);
the plots show generator-level final state lepton pT that is either prompt or from a tau that itself is prompt;
in the comparisons of single-top/t+gamma, W+jets/W+gamma and semi-leptonic decays of ttbar+jets/ttbar+gamma, pt of the subleading lepton is shown;
in the comparisons of DY/Z+gamma and dileptonic decays of ttbar+jets/ttbar+gamma, pt of the third lepton is shown;
in the comparison of single-top/t+gamma, only the t-channel contribution of the single-top process is considered because the t+gamma simulation is based on t-channel single-top, with extra photon + a b-quark (that shouldn't affect the lepton pT spectrum, aside from maybe the normalization through cross section);
W+jets is done with both inclusive and exclusive samples stitched together while in the case of DY only the inclusive sample is used. It's safe to assume that adding the exclusive samples doesn't change the conclusion.

From these plots it's apparent that the pT spectrum of the "extra" lepton in each case is smooth if the gen photon filter is properly applied.

The Ntuple post-processing will finish likely tomorrow (unless there are some jobs that exceed the 48h mark like some 2018 jobs did); reskimming shouldn't take more than a day.

HEP-KBFI / hh-multilepton

Do not limit gen photon filter to status = 1 photons #36