Closed mdhank closed 6 months ago
Dear @tofitsch ,
Could you take a quick look at this? I've tested that it behaves as I expect, but it's good to have a reviewer separate from the author.
Thanks, Michael
Looks generally good to me. I trust you with all the numbers and will not check them :D only worry I have is regarding backwards compatibility. E.g. before any file sample name containing H7
got assigned Herwig7
, now only if it contains PHH7EG
it is Herwig7p2
but if it contains H7
but not PHH7EG
it will be assigned Unknown
. Not sure if that's ok? Are there no possible sample names that contain H7
but not PHH7EG
?
Looks generally good to me. I trust you with all the numbers and will not check them :D only worry I have is regarding backwards compatibility. E.g. before any file sample name containing
H7
got assignedHerwig7
, now only if it containsPHH7EG
it isHerwig7p2
but if it containsH7
but notPHH7EG
it will be assignedUnknown
. Not sure if that's ok? Are there no possible sample names that containH7
but notPHH7EG
?
Good idea, I'll check for h7s. There are some such as
mc23_13p6TeV.525891.MGH7EG_LO_lj_tchannel_el_150to200.deriv.DAOD_PHYS.e8544_s4162_r14622_p5855
,
mc23_13p6TeV.522027.aMCH7EG_NNPDF30NLO_H723UE_ttmumu_run3.deriv.DAOD_PHYS.e8558_a910_r14932_p6026
,
mc20_13TeV.346526.PowhegHerwig7EvtGen_H7UE_NNPDF30ME_ttH125_gamgam.deriv.DAOD_PHYS.e7488_s3681_r13167_p5631
, or
mc23_13p6TeV.830148.H7EG_H72NNPDF30NLO_jetjet_Cluster_JZ6.deriv.DAOD_PHYS.e8551_s4159_r14799_p5855
.
The amc definitely shouldn't fall under this case as it has it's own MC-MC SFs, and I don't think MG should either as the MC-MC SFs (https://ftag.docs.cern.ch/algorithms/activities/mcmc/#run-2) specify they are for Powheg+Herwig7. I believe the one with PowhegHerwig in the name should fall under Herwig7.1. I'm less clear on the dijet samples.
There are also datasets like
mc23_13p6TeV.902037.QBHPy8EG_QBH_photonjet_n6_Mth7000.deriv.DAOD_PHYS.e8557_s4162_r14622_p6026
and
mc23_13p6TeV.603317.PhPy8EG_ggH700W20_tautaulh.deriv.DAOD_PHYS.e8566_s4162_r14622_p6026
which have h7 but should definitely not be included. So I'd lean towards leaving as is to avoid false postives.
It could be worth adding a warning in general that it's good to double-check what values the parsing uses, as it's very easy to miss edge cases, but I have mixed feelings on this.
@tofitsch Does this plan (the PR as is) sound reasonable? I think this is the safest solution- if there's patterns that we should be counting but aren't, they'll give errors and users can either report them (in which case we'll adjust to add them), or manually set which to use. I believe the alternative would count patterns that shouldn't be counted, which I would rather not do as it makes it possible we will process some incorrectly.
Updating MC-MC SFs to match https://ftag.docs.cern.ch/algorithms/activities/mcmc/#release-22-supported-generators and the CDI files (with the CDI files given precedence). Note this requires a new option in BJetEfficiencyCorrector, isRun3, which is set to False by default.