HEP-KBFI / hh-multilepton

code and python config files for hh -> 4tau and hh->wwww analyses (all lepton and tau channels)
0 stars 5 forks source link

Update prompt lepton MVA cut #14

Closed ktht closed 3 years ago

ktht commented 4 years ago

There's an example implemented in ttZctrl analysis of ttH repository that can be propagated to all analyses in this repo, that avoids changing the C++ source code of the core FW. This is important because we use the same FW also in another (bbWW) analysis. Will implement it this afternoon.

ktht commented 4 years ago

The feature is implemented in all analyses. In order to test it, one has to add -L hh_multilepton to the command that runs the workflow, for instance

$CMSSW_BASE/src/hhAnalysis/multilepton/test/hhAnalyzeRun_0l_4tau.py \
  -e 2017 -v 2020Aug13 -m default -L hh_multilepton

The choice of these alternative WPs can be made permanent in eg here: https://github.com/HEP-KBFI/hh-multilepton/blob/e61150e338a9c2648d1adb5c7901e807ed7b0f03/test/hhAnalyzeRun_0l_4tau.py#L25 However, it's not possible to run the analysis with these alternative WPs, yet, because the corresponding FR are missing.

@rdewanje You can run the measurement in similar fashion:

$CMSSW_BASE/src/tthAnalysis/HiggsToTauTau/test/tthAnalyzeRun_LeptonFakeRate.py \
  -e 2017 -v 2020Aug13 -m default -L hh_multilepton

The resulting FR files should be copied to data/FR_lep_ttH_mva_{2016,2017,2018}.root of this repository (see the logic implemented here).

siddhesh86 commented 4 years ago

Incorporated the prompt lepton mva cut and the whole new lepton selection (Tight and Fakeable) into hh_multilepton analysis as default. https://github.com/HEP-KBFI/hh-multilepton/commit/a30ae750ce59ba9258c8ec5b88534bb64ed2479b

ktht commented 4 years ago

It was easier to recompute the tight lepton ID SF on the fly than to mess around with dedicated TH2 histograms, especially given that we want to keep the previous nuisances associated with tight lepton ID SF and introduce second source of uncertainty derived from 50% of the difference between old and recomputed SF (relevant LOCs):

const double sf_recomp_tmp = 1. - (1. - sf_tmp) * recompSF;
const double sf_recomp_shift = 0.5 * std::fabs(sf_tmp - sf_recomp_tmp);
const double sf_recomp = sf_recomp_tmp + error_shift * sf_recomp_shift;

The new SF are implemented in all HH multilepton analysis channels (except for the code that tests SVfit) but only for 2017 era because the multiplicative factors shown in this presentation were given only for 2017. @siddhesh86, please edit 2016 and 2018 data/MC interface in ttH repository similarly to how it's done in 2017 data/MC interface (reference):

recompTightSF_el_ = (1. - 0.883) / (1. - 0.755);
recompTightSF_mu_ = (1. - 0.981) / (1. - 0.882);

(I assume that the corrective factors will be determined separately for each era.)

ktht commented 4 years ago

@siddhesh86 I moved the initialization of recompTightSF_el_ and recompTightSF_mu_ to the base class because it's more convenient (see these lines). Last time I didn't notice that the base class had era_ member variable.

siddhesh86 commented 4 years ago

@ktht I have updated the SF correction values for all three eras. [1] The SF correction for 2017 is changed slightly, w.r.t. the value reported earlier, after fixing a bug in the SFcorrection computation code. [1] https://github.com/HEP-KBFI/tth-htt/commit/ec3c83e5f44ff2c41a9db81c9d175e2145b7dc11

ktht commented 4 years ago

Thanks. Is there anything else to be planned here, with regards to the new lepton definition? Do you see any point in trying to derive systematic uncertainties for the new fake rates, or do you think that the systematic uncertainties determined from MC closure is enough for our purposes?

rdewanje commented 4 years ago

I think the MC derived uncertainties should be enough.

siddhesh86 commented 4 years ago

I am doing a check in ttbar control region enrich in non-prompt leptons for Data/MC closure. I think this might be interesting before signing off the new lepton definition.

ktht commented 4 years ago

It just occurred to me that by default the analysis runs on skimmed samples (unless you explicitly disable it with -p 0 while submitting (just like in BDT training). We used fakeable lepton definition of ttH analysis in the skimming which means that the samples (both signal and background) need to be reskimmed specifically for this analysis again. I propose that you use unksimmed Ntuples for everything until the samples are reskimmed.

ktht commented 4 years ago

Made the unskimmed samples the default for now.

rdewanje commented 4 years ago

That means I will have to run analysis in default mode again to get the latest yields. I ran with -p True in this round.

ktht commented 4 years ago

As agreed in the last meeting, we decided to derive the relative uncertainties on the FRs measured from the ttH lepton definition and apply them to the measurement done with the relaxed lepton ID. This is done with scripts/apply_FR_relErrors.py in the following way:

for era in 2016 2017 2018; do \
  apply_FR_relErrors.py \
    -i data/FR_lep_mva_hh_multilepton_${era}_KBFI_2020Oct27_woTightCharge.root \
    -r ../../tthAnalysis/HiggsToTauTau/data/FR_lep_ttH_mva_${era}_CERN_2019Jul08.root \
    -o data/FR_lep_mva_hh_multilepton_${era}_KBFI_2020Oct27_woTightCharge_wSysUnc.root; \
done

The only remaining item here is the skimming of the Ntuples with the relaxed lepton ID.

ktht commented 3 years ago

Both signal and background samples are skimmed based on the multiplicity of lepton that pass the relaxed fakeable lepton definition.