STXS binning in other Higgs processes

ktht commented 4 years ago

Apparently, the combination guys want other H processes to be split into STXS bins as described here: https://twiki.cern.ch/twiki/bin/viewauth/CMS/HiggsWG/SignalModelingTools

We'll probably have to rerun Ntuple production for the other single H processes in order to get HTXS 1.2 branches. Also, we cannot use the master branch anymore because of the changes that have been accumulating over the months due to HH analysis efforts.

ktht commented 4 years ago

This one's a bit better at explaining the categories, but we don't know what jet definition (pT > 25 or pT > 30) should be used for the classification: https://twiki.cern.ch/twiki/bin/view/LHCPhysics/LHCHXSWGFiducialAndSTXS

The second problem is replicating identical yields from the samples. Assuming that we really need to split only the ggH, qqH and VH processes, then the relevant samples that we need to reprocess are:

/VBFHToGG_M125_13TeV_amcatnlo_pythia8/RunIISummer16MiniAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3_ext1-v2/MINIAODSIM @ 95.0%
/GluGluHToMuMu_M-125_TuneCP5_PSweights_13TeV_powheg_pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15_ext1-v1/MINIAODSIM @ 93.3%

The solution is to create a lumi mask from the existing Ntuples, rerun their Ntuple production and apply the lumi mask.

ktht commented 4 years ago

This comment is very useful: https://github.com/HEP-KBFI/tth-htt/issues/142#issuecomment-661956268

ktht commented 3 years ago

Just so that this information won't be lost in a Skype convo: the processes that we need to split are qqH, ggH and VH, we should use jets with pT > 30, and we should use the coarse pT binning.

ktht commented 3 years ago

Unfortunately, the HTXS branches produced in CMSSW 10_2_10 are stage 1.0 but we need stage 1.2. It is not possible to recover this information from the existing Ntuples, nor would it be possible to update CMSSW such that it adds the missing stage 1.2 branches in the existing code base due to external dependencies (in particular, due to dependency on the Rivet tool).

The easiest way out, in my opinion, is to

find officially produced NanoAODv7 Ntuples that descend from the MiniAOD datasets we used in our Ntuple production, and copy them to our tier;
extract run, lumi, and event and stage1_2_cat_pTjet30GeV branches;
in post-production, copy the classification codes stored in stage1_2_cat_pTjet30GeV to the output Ntuple;
when counting the events in post-production or when running the analysis, we just have to read the new branch and fill the corresponding histogram.

The classification codes can be found from the header file of this project.

With this method, we don't have to use new Ntuples that I just had produced. Therefore, there's no need to mess around with the lumi mask that would need to be applied in order to recover the original statistics of the samples that we had at the time of last datacard production.

The plan is to repeat the analysis only on the ggH, qqH and VH samples, and merge the STXS histograms with the prepareDatacard ROOT files before feeding them to combine.

ktht commented 3 years ago

The STXS-split cards are produced and hosted in one of my private repositories on gitlab. I'll not go into too much details but quite a bit more work had to be done at the datacard level that isn't directly related to STXS splitting yet affected the final fit results. Since I stopped referencing this issue in my commits long ago, I'll just say that the relevant changes are in dcard_production branch of this repository, hh-multilepton, hh-bbww and tth-nanoAOD-tools repositories; and in HIG-19-008-backup branch of our CombineHarvester fork. These features are incompatible with the master branch, and given that the two branches have diverged a lot I have no plans to merge the feature branch.

The necessary inputs (the contents of stxs_rescaled.tar.gz) are created with rescale_stxs.sh from analysis results. All analysis results are archived and hosted in my local archives directory on /hdfs, files ttHAnalysis_stxs_*.tar.*z:

2020Jun18 is STXS-inclusive run;
2020Nov27 run contains STXS-binned processes, except for QCD shapes of WH, ZH, ggH and qqH;
2021Jul03 has QCD shapes of WH, ZH, ggH and qqH.

The final datacards are produced with stxs_cmds.sh. There's also the possibility to reproduce STXS-inclusive cards with inclusive_cmds.sh. The combination, fitting, plotting etc is carried out separately in CMSSW_10_2_13.

AFAICS, the only open item is related to STXS migration uncertainties, but the functionality of applying them is already implemented. Given that there's not much else to do from my side anymore I'll close the issue.

HEP-KBFI / tth-htt

STXS binning in other Higgs processes #161