HEP-KBFI / tth-htt

code and python config files for ttH, H -> tautau analysis with matrix element techniques
3 stars 10 forks source link

STXS binning in other Higgs processes #161

Closed ktht closed 3 years ago

ktht commented 4 years ago

Apparently, the combination guys want other H processes to be split into STXS bins as described here: https://twiki.cern.ch/twiki/bin/viewauth/CMS/HiggsWG/SignalModelingTools

We'll probably have to rerun Ntuple production for the other single H processes in order to get HTXS 1.2 branches. Also, we cannot use the master branch anymore because of the changes that have been accumulating over the months due to HH analysis efforts.

ktht commented 4 years ago

This one's a bit better at explaining the categories, but we don't know what jet definition (pT > 25 or pT > 30) should be used for the classification: https://twiki.cern.ch/twiki/bin/view/LHCPhysics/LHCHXSWGFiducialAndSTXS

The second problem is replicating identical yields from the samples. Assuming that we really need to split only the ggH, qqH and VH processes, then the relevant samples that we need to reprocess are:

The solution is to create a lumi mask from the existing Ntuples, rerun their Ntuple production and apply the lumi mask.

ktht commented 4 years ago

This comment is very useful: https://github.com/HEP-KBFI/tth-htt/issues/142#issuecomment-661956268

ktht commented 3 years ago

Just so that this information won't be lost in a Skype convo: the processes that we need to split are qqH, ggH and VH, we should use jets with pT > 30, and we should use the coarse pT binning.

ktht commented 3 years ago

Unfortunately, the HTXS branches produced in CMSSW 10_2_10 are stage 1.0 but we need stage 1.2. It is not possible to recover this information from the existing Ntuples, nor would it be possible to update CMSSW such that it adds the missing stage 1.2 branches in the existing code base due to external dependencies (in particular, due to dependency on the Rivet tool).

The easiest way out, in my opinion, is to

The classification codes can be found from the header file of this project.

With this method, we don't have to use new Ntuples that I just had produced. Therefore, there's no need to mess around with the lumi mask that would need to be applied in order to recover the original statistics of the samples that we had at the time of last datacard production.

The plan is to repeat the analysis only on the ggH, qqH and VH samples, and merge the STXS histograms with the prepareDatacard ROOT files before feeding them to combine.

ktht commented 3 years ago

The STXS-split cards are produced and hosted in one of my private repositories on gitlab. I'll not go into too much details but quite a bit more work had to be done at the datacard level that isn't directly related to STXS splitting yet affected the final fit results. Since I stopped referencing this issue in my commits long ago, I'll just say that the relevant changes are in dcard_production branch of this repository, hh-multilepton, hh-bbww and tth-nanoAOD-tools repositories; and in HIG-19-008-backup branch of our CombineHarvester fork. These features are incompatible with the master branch, and given that the two branches have diverged a lot I have no plans to merge the feature branch.

The necessary inputs (the contents of stxs_rescaled.tar.gz) are created with rescale_stxs.sh from analysis results. All analysis results are archived and hosted in my local archives directory on /hdfs, files ttHAnalysis_stxs_*.tar.*z:

The final datacards are produced with stxs_cmds.sh. There's also the possibility to reproduce STXS-inclusive cards with inclusive_cmds.sh. The combination, fitting, plotting etc is carried out separately in CMSSW_10_2_13.

AFAICS, the only open item is related to STXS migration uncertainties, but the functionality of applying them is already implemented. Given that there's not much else to do from my side anymore I'll close the issue.