UCATLAS / xAODAnaHelpers

ATLAS Run 2 and Run 3 analysis framework for AnalysisTop and AnalysisBase for proton-smashing physics
https://ucatlas.github.io/xAODAnaHelpers/
Apache License 2.0
42 stars 127 forks source link

Fixing FTAG MCMC SFs #1686

Closed mdhank closed 6 months ago

mdhank commented 7 months ago

Updating MC-MC SFs to match https://ftag.docs.cern.ch/algorithms/activities/mcmc/#release-22-supported-generators and the CDI files (with the CDI files given precedence). Note this requires a new option in BJetEfficiencyCorrector, isRun3, which is set to False by default.

mdhank commented 7 months ago

Dear @tofitsch ,

Could you take a quick look at this? I've tested that it behaves as I expect, but it's good to have a reviewer separate from the author.

Thanks, Michael

tofitsch commented 6 months ago

Looks generally good to me. I trust you with all the numbers and will not check them :D only worry I have is regarding backwards compatibility. E.g. before any file sample name containing H7 got assigned Herwig7, now only if it contains PHH7EG it is Herwig7p2 but if it contains H7 but not PHH7EG it will be assigned Unknown. Not sure if that's ok? Are there no possible sample names that contain H7 but not PHH7EG?

mdhank commented 6 months ago

Looks generally good to me. I trust you with all the numbers and will not check them :D only worry I have is regarding backwards compatibility. E.g. before any file sample name containing H7 got assigned Herwig7, now only if it contains PHH7EG it is Herwig7p2 but if it contains H7 but not PHH7EG it will be assigned Unknown. Not sure if that's ok? Are there no possible sample names that contain H7 but not PHH7EG?

Good idea, I'll check for h7s. There are some such as mc23_13p6TeV.525891.MGH7EG_LO_lj_tchannel_el_150to200.deriv.DAOD_PHYS.e8544_s4162_r14622_p5855, mc23_13p6TeV.522027.aMCH7EG_NNPDF30NLO_H723UE_ttmumu_run3.deriv.DAOD_PHYS.e8558_a910_r14932_p6026, mc20_13TeV.346526.PowhegHerwig7EvtGen_H7UE_NNPDF30ME_ttH125_gamgam.deriv.DAOD_PHYS.e7488_s3681_r13167_p5631, or mc23_13p6TeV.830148.H7EG_H72NNPDF30NLO_jetjet_Cluster_JZ6.deriv.DAOD_PHYS.e8551_s4159_r14799_p5855.

The amc definitely shouldn't fall under this case as it has it's own MC-MC SFs, and I don't think MG should either as the MC-MC SFs (https://ftag.docs.cern.ch/algorithms/activities/mcmc/#run-2) specify they are for Powheg+Herwig7. I believe the one with PowhegHerwig in the name should fall under Herwig7.1. I'm less clear on the dijet samples. There are also datasets like mc23_13p6TeV.902037.QBHPy8EG_QBH_photonjet_n6_Mth7000.deriv.DAOD_PHYS.e8557_s4162_r14622_p6026 and mc23_13p6TeV.603317.PhPy8EG_ggH700W20_tautaulh.deriv.DAOD_PHYS.e8566_s4162_r14622_p6026 which have h7 but should definitely not be included. So I'd lean towards leaving as is to avoid false postives.

It could be worth adding a warning in general that it's good to double-check what values the parsing uses, as it's very easy to miss edge cases, but I have mixed feelings on this.

@tofitsch Does this plan (the PR as is) sound reasonable? I think this is the safest solution- if there's patterns that we should be counting but aren't, they'll give errors and users can either report them (in which case we'll adjust to add them), or manually set which to use. I believe the alternative would count patterns that shouldn't be counted, which I would rather not do as it makes it possible we will process some incorrectly.