Missing systematics - Githubissues

ktht commented 2 years ago

After yesterday's discussion it seems that we're still missing some systematic uncertainties that other groups have implemented:

JMS, JMR -- we already have them, just need to re-enable them again;
Uncertainty associated with correcting LO W+jets spectrum to NLO using LHE info -- more details at AN2019/229v6, section 4.4.3;
~PS uncertainties on other backgrounds besides ttbar -- will have to check if it's possible to add them, and if not then need to understand why. Correlation still up for debate (imo);~ (see my next post)
Uncertainty associated with b-tagging SFs on AK8 subjets.

The first two points are easy enough; the third requires some investigation. The fourth item is the most challenging, because our Ntuples simply lack the information needed to compute these SFs. Will need to discuss what our options are, because I don't think we can just ignore them since they rank relatively high in terms of impact.

ktht commented 2 years ago

Interim update: JMS and JMR are now enabled. Implemented LO-to-NLO corrections for W+jets samples. Its uncertainty is named as Vpt_nlo, so no need to rename it. One thing that concerns me a bit is that the W+jets background will increase by roughly 50% after this change. Either this background must be so small that it didn't show up in data/MC comparison, or there's going to be a problem.

As for the PS weights, after closer inspection it looks like

in 2016, there are no resonant SL or DL samples with PS weights;
in 2017, only half of resonant SL samples have PS weights while none of the resonant DL samples have them;
in 2018, all resonant samples have PS weights.

Thus, I see little point in implementing it for the signal samples. I'll work on subjet b-tagging next week.

ktht commented 2 years ago

So I did a couple of sanity checks before starting to implement the solution for determining hadron flavor of AK8 subjets. The first test was to verify that AK8 subjet hadron flavor can be determined from the number of matching b-hadrons and c-hadrons considered in the ghost clustering in the following way: if the number of matching b-hadrons is greater than zero, assign hadron flavor 5; if the number of matching b-hadrons is zero but the number of matching c-hadrons is greater than zero, then assign 4 as the hadron flavor; otherwise, assign zero as the hadron flavor. I performed the test on a fully hadronic UL ttbar sample (because UL samples have all this info available out-of-the-box) and it seems to work out perfectly. Thus, the information available in NanoAODv7 is enough to determine AK8 subjet b-tagging SFs.

However, we're not using NanoAODv7, so we need another proxy to determine the flavor composition of AK8 subjets. Here are some ideas:

dR-match each AK8 subjet to quarks in descending order of pT with no ambiguities (ie each quark can be matched to AK8 subjet only once), and use the PDG ID of the quark as the hadron flavor of the AK8 subjet. The minimum dR between the matched particles must be less than 0.4;
dR-match each AK8 subjet to AK4 gen jet in descending order of pT with no ambiguities (ie each AK4 gen jet can be matched to AK8 subjet only once), and use hadron flavor of the AK4 gen jet as that of the AK8 subjet. The minimum dR between the matched particles must be less than 0.4.

Based on NanoAODv7, both options yield comparable results for fully hadronic ttbar. However, for samples like DY the first idea doesn't work at all because the only quarks that are available in our post-processed Ntuples are either from V boson decays or from top decays (including hadronic W decay products). The problem is that the extra jets that are added at the ME level are not descending from neither of those cases, at least not according to Pythia gen particle listing. The situation is similar for other samples such as W+jets, in which the extra jets are introduced at the ME level. Thus, it really leaves the first option as our only viable way to determine these hadron flavors.

Below are confusion tables determined from ttbar (fully hadronic) and DY events separately using NanoAODv7 samples. Each cell shows how many AK8 subjets with hadron flavor given by the row correspond to the AK4 gen jet with hadron flavor given by the column. Numbers in parentheses indicate the classification rate for AK8 subjets with hadron flavor given by the row. The true positive rate for identifying b-flavored, c-flavored and light-flavored AK8 subjets is 89% (75%), 78% (51%) and 98% (99%) based on ttbar (DY) events. The method is clearly suboptimal in identifying c-flavored AK8 subjets, but I think it's best we can do with our post-processed Ntuples at the moment.

+-----------------+-----------------------------------------------+
| ttbar           |                 AK4 gen jets                  |
| (168000 events) +---------------+---------------+---------------+
|                 |       5       |       4       |       0       |
+-------------+---+---------------+---------------+---------------+
|             | 5 | 17993 (89.2%) |   336 (1.7%)  |  1840 (9.1%)  |
| AK8 subjets | 4 |   216 (1.5%)  | 11070 (77.8%) |  2948 (20.7%) |
|             | 0 |   621 (1.1%)  |   696 (1.3%)  | 53894 (97.6%) |
+-------------+---+---------------+---------------+---------------+

+-----------------+-----------------------------------------+
| DY              |              AK4 gen jets               |
+ (191928 events) +------------+-------------+--------------+
|                 |      5     |      4      |      0       |
+-------------+---+------------+-------------+--------------+
|             | 5 | 45 (75.0%) |   0 (0.0%)  |  15 (25.0%)  |
| AK8 subjets | 4 |  1 (0.5%)  | 108 (51.2%) | 102 (48.3%)  |
|             | 0 |  7 (0.2%)  |  22 (0.7%)  | 3285 (99.1%) |
+-------------+---+------------+-------------+--------------+

I'll try to implement the SFs today. If I'm not done today, I'll finish it tomorrow.

ktht commented 2 years ago

I cannot find the place in the code where we require b-tagged subjets to have pT > 30 GeV. It has some (arguably minor) implications in determining the event-level weight from subjet b-tagging SFs.

saswatinandan commented 2 years ago

I guess it is implemented here ?

https://github.com/HEP-KBFI/hh-bbww/blob/master/src/RecoJetCollectionSelectorAK8_hh_bbWW_Hbb.cc#L226

From: Karl Ehatäht @.*** Sent: 23 February 2022 17:15 To: HEP-KBFI/hh-bbww Cc: Subscribed Subject: Re: [HEP-KBFI/hh-bbww] Missing systematics (Issue #39)

I cannot find the place in the code where we require b-tagged subjets to have pT > 30 GeV. It has some (arguably minor) implications in determining the event-level weight from subjet b-tagging SFs.

— Reply to this email directly, view it on GitHubhttps://github.com/HEP-KBFI/hh-bbww/issues/39#issuecomment-1048698712, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACKEDDD7JRMMUIWJLSODGXDU4TCFNANCNFSM5OVO4YBQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ktht commented 2 years ago

Thanks, that settles it.

I discussed the details of computing the b-tagging SFs for AK8 subjets with Louvain. The recommendation is to use method 1a in BTV Twiki. This implies that we compute the subjet b-tagging efficiencies from MC in bins of pT, |eta| and hadron flavor of the AK8 subjet. I'll create another workflow in tthProjection.py that determines these efficiencies.

ktht commented 2 years ago

Well.. I made a mistake in my assessment that we have access to hadron flavors of AK4 gen jets in our post-processed Ntuples. I had this impression because we have dedicated GenJetReader that could've been easily modified to read the hadron flavor branch -- but I never actually looked what's inside the post-processed Ntuples. Turns out that we used GenParticleWriter to transcribe the GenJet collection, so hadron flavor information is completely lost in post-production.

This has two implications:

we have to use generator-level partons that are available in our post-processed Ntuples (GenQuarkFromTop, GenBQuarkFromTop, GenWZQuark and GenHiggsDaughters) and dR-match them to subjets;
we can derive the SF only for the processes that contain at least one top quark, hadronically decaying vector boson or Higgs. We cannot derive these SF for eg W+jets or DY. To keep things simpler, we'll apply the SF only to ttbar and to HH signal.

If these SFs cause any problems in dedicated SL fits or SL+DL combination, we always have the option to post-process the Ntuples again such that we could use the hadron flavor of matched AK4 gen jets as proxy to AK8 subjet flavor. I would delay these efforts unless really necessary.

ktht commented 2 years ago

All corrections and uncertainties are implemented. However, there seems to be a problem with the LHE Vpt reweighting that needs to be investigated on a longer time scale. For now I've disabled the corrections (and associated systematics). I'll keep the thread open until this last item is resolved.

@saswatinandan Please update: this repository, ttH repository and $CMSSW_BASE/src/PhysicsTools/NanoAODTools repository before submitting any jobs.

ktht commented 2 years ago

Ok so we're not going to apply these corrections to LO samples at all but try out NLO W+jets samples instead. I'll refresh my memory on how to do it all over again and start with the Ntuple production asap.

veelken commented 2 years ago

Thank you very much, Karl!

ktht commented 2 years ago

@saswatinandan samples are ready. You now have to specify in the command line which samples you want to use:

-W lo if you want to use LO samples;
-W nlo if you want to use NLO samples.

HEP-KBFI / hh-bbww

Missing systematics #39