CJLST / ZZAnalysis

14 stars 50 forks source link

Bug in ZZ candidate leptons assignment #236

Closed bonanomi closed 7 months ago

bonanomi commented 7 months ago

I believe the getLeptons function has a bug:

Example (EventNumber==4057199 from /store/mc/Run3Summer22EEMiniAODv4/ZHto2Zto4L_M125_TuneCP5_13p6TeV_powheg2-minlo-HZJ-JHUGenV752-pythia8/MINIAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/30000/6fdd2d9e-7213-4815-b37d-b3dfc6a5cdb5.root):

image image

with the initial collection of Electron and Muon being:

image

so we end up selecting the first two leptons (electrons) correctly (they are at ZZCand_Z1l1Idx and ZZCand_Z1l2Idx 1 and 0, respectively), but the third and fourth leptons (muons) are wrong (we have ZZCand_Z2l1Idx and ZZCand_Z2l2Idx being 3 and 4, respectively).

Is this a bug with getLeptons or with the assignment of the ZZCand_Z2l1Idx and ZZCand_Z2l2Idx indices?

@namapane do you have any suggestion?

bonanomi commented 7 months ago

The issue wit this specific example was in the different selection of the ZZ candidate (by DKin vs Z2 pT). Now everything is in agreement. Only two events are out of synch but need to understand why. Closing this issue for now.

namapane commented 7 months ago

Hi Matteo, BTW it would make sense to use the same selection for all analyses. For most published results we used Dkin, can we agree to stick to that? Regarding remaining differences, at this rate they can be due to rounding of variables in nano, which occasionally can move a lepton outside acceptance. For electrons there are also rare cases where the rounding moves an e to a different bin in the bdt cuts. I have been debugging many of these differences in the past and I can have a look to these 2 candidates if you wish.

AlessandroTarabini commented 7 months ago

Hi Nicola, regarding the selection of the best ZZ candidate, we should stick to the "highest pT" for fiducial analyses. The definition of the fiducial phase space should only include cut-based requirements (or at least selections easily reproducible by theoreticians). Since the aim of fiducial analysis is to maximise the overlap between the fiducial phase space and the detector-level, we should use the same selections at both levels.

namapane commented 7 months ago

Ok but this means we have to keep separate productions, etc. with the resulting extra work and confusion. We could also switch back to higherpt for everything, but using Dkin was shown to handle the candidate choice better in associated production. We could recheck that...

Il giorno 4 apr 2024, alle ore 16:52, AlessandroTarabini @.**@.>> ha scritto:

Hi Nicola, regarding the selection of the best ZZ candidate, we should stick to the "highest pT" for fiducial analyses. The definition of the fiducial phase space should only include cut-based requirements (or at least selections easily reproducible by theoreticians). Since the aim of fiducial analysis is to maximise the overlap between the fiducial phase space and the detector-level, we should use the same selections at both levels.

— Reply to this email directly, view it on GitHubhttps://github.com/CJLST/ZZAnalysis/issues/236#issuecomment-2037428298, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHN4UAQD2SBZN5L4YHII7TY3VSJRAVCNFSM6AAAAABFW6RFRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXGQZDQMRZHA. You are receiving this because you were mentioned.Message ID: @.***>

AlessandroTarabini commented 7 months ago

Since everything with NanoAOD is based on indices, we could include two bestCandIdx indices, one for Dkin and one for higherPt. In that case, we have one single production and one single set of CJLST ntuples, so it is left to the single analysis which candidate to choose.

namapane commented 7 months ago

That is possible but requires some additional code because:

namapane commented 7 months ago

BTW all recent RunIII productions have been done with the Dkin selection, right?

bonanomi commented 7 months ago

Hi, I would also avoid having two flags for two different selection criteria of the ZZ candidates, as it can create confusion in the usage and in the book keeping. Nicola, yes, now that you mention it, I believe that all the productions ran so far have Dkin as selection criteria. I have to check on the 125 GeV signal samples, because it may be that I used Z2 PT criterion in my private production.

namapane commented 7 months ago

For the time being I opened #238 to keep track of what adding the two set of indices would imply. I also added a note about recheching the Dkin criterion and/or finding a common one.

The problem of confusion is very real, as you point out we may have an inconsistent set of samples as different people may have set their area in different ways, so in any case we would need to find a way to standardize productions.