Apply lepton ID SF only to prompt leptons

ktht commented 4 years ago

As discussed in today's HH multilepton meeting, the problem is that we don't have explicit checks in place that could prevent applying the lepton ID SF using non-prompt or fake leptons. This is not a problem in the signal region because the signal leptons are always required to pass the gen matching conditions, but in the fake CR and MC closure regions this is a problem because we apply reco-to-loose ID SF using all selected leptons, including fakes and non-prompt ones. In case of hadronic taus, this has already been taken care of (as it is explicitly required from the interface of Tau ID SF package provided by the Tau POG).

The solution is to:

update the interface of Data_to_MC_CorrectionInterface* such that we don't propagate individual lepton parameters but instead we pass the pointers to the lepton objects. This saves us from potential trouble in the future if we need to know yet another lepton variable when evaluating the lepton ID SF;
update Data_to_MC_CorrectionInterface* such that it computes the lepton ID SF for the prompt leptons (= leptons gen-matched to gen leptons);
update the interface of the class that performs queries to Tau ID SF package such that it takes the list of hadronic tau objects as input, as opposed to the list of individual parameters of the selected hadronic taus. The evaluation logic remains untouched.

I will implement it in the coming days.

The changes need to be propagated to all analyses, including ttH and HH->bbWW analysis. We probably need to inform bbWW people and re-synchronize with them. @saswatinandan @veelken

veelken commented 4 years ago

Hi Karl,

I will take care of updating the HH->bbWW code once you have implemented the new interface in the Data_to_MC_CorrectionInterface*.

Cheers,

Christian

ktht commented 4 years ago

The feature is implemented in all three repositories (including bbWW). The only obstacle was the code of charge flip measurement, in which the electron pT is shifted up/down depending on the choice of systematics. The recomputed electron pT was then used to evaluate the lepton ID SFs. I've changed it such that it uses nominal electron pT instead to compute the SFs. This inconsistency needs to be fixed, though. The shifts in pT cannot be moved to RecoElectronReader because the resolution systematics requires knowledge about generator level electrons, which cannot be done in advance in RecoElectronReader. The easiest approach is probably implementing a setter in RecoElectron and in its base classes that recomputes all variables derived from a 4-momentum, and modify the 4-momentum at the event level, given the systematics.

ktht commented 4 years ago

So that we won't lose track of what we discussed in the meeting this morning: there's an issue with the logic implemented in all analyses, that if we enter the MC closure region, we still apply loose-to-tight lepton ID SF to leptons of one flavor that pass the fakeable cuts if the other flavor of leptons are required to pass the tight cuts: https://github.com/HEP-KBFI/hh-multilepton/blob/64d42e541a120b53f3200a77a3aa35b46063a4ff/bin/analyze_hh_3l.cc#L1731-L1738 Instead, what we should do in this case is to apply loose-to-tight lepton ID SF using leptons that are prompt and that pass the tight cuts, and do this only if we require tight leptons of this particular flavor in the event selection. Conceptually, the SF applied to the event depend on the requirements imposed in the event selection, and not by arbitrary cuts that are not part of the event selection. Did we agree to fix this?

We also concluded that, at least in the long term, we need to rethink how the data/MC corrections should be treated in the context of fake rate measurement and application.

veelken commented 4 years ago

Hi Karl,

please don't change the code right now. I think it is better if we first think about the correct approach of how the application of lepton data/MC corrections "interferes" with the fake-rate method. I think the best place to start is with the equations in Section 7.5 of AN-2013/159 v12.

I agree that for the MC closure the loose-to-tight SF should be apply to leptons of the flavor for which the tight lepton selection cuts are applied (muons in the MC_closure_e case, electrons in the MC_closure_mu case). The SF need to match the lepton selection cuts that are applied (if someone develops a new supertight electron selection that has a data/MC SF of 0.1, this SF should never be used anywhere in our analysis, since we don't apply a cut on supertight electrons anywhere in our analysis)

Possible outcome of thinking this through may be: 1) the "interference" between lepton data/MC corrections and the fake-rate method may be relevant for the MC closure test only and not affect the fakes background estimate obtained from data (fakes_data) 2) we may need to alter the MC closure test 3) we may need to measure data/MC SF for fakeable leptons in the future 4) something else (?)

ktht commented 4 years ago

Just so that we have a point of reference when looking back to our code: we decided to apply loose-to-tight lepton ID SF to all prompt leptons that pass the tight cuts, even if we necessarily don't require the fakeable leptons to pass the tight cuts. Here are the slides for reference. The logic was implemented with this commit in ttH repository.

@siddhesh86 please close the issue if you think it's resolved.

HEP-KBFI / hh-multilepton

Apply lepton ID SF only to prompt leptons #18