Closed steggema closed 4 months ago
cms-bot internal usage
A new Issue was created by @steggema.
@rappoccio, @makortel, @antoniovilela, @smuzaffar, @sextonkennedy, @Dr15Jones can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
FYI @cms-sw/jetmet-pog-l2 @cms-sw/pf-l2 @stahlleiton @cms-sw/reconstruction-l2
assign reconstruction
New categories assigned: reconstruction
@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks
type jetmet, pf
what are the ParticleType
s for which NaN is obtained? In which years of Run3 is this happening?
and it will be useful to have a recipe where one can reproduce the issue and can debug.
Are the full kinematics nan or just dz? We had some instances where problematic tracks led to this issue last year, but it's extremely difficult to debug to a general class of problems. I would start by excluding them from the deep met.
We had some instances where problematic tracks led to this issue last year,
right, thanks, it could be related or similar, here is the documentation of that issue which Kenneth had debugged https://github.com/cms-sw/cmssw/issues/39110
To answer the questions in one go:
To reproduce: A few events with the issue can be obtained with
edmCopyPickMerge outputFile=pickevents.root eventsToProcess=1:4256897,1:4257026,1:4257054,1:4418151,1:4418165,1:4418588 inputFiles=/store/mc/Run3Summer22MiniAODv4/DYJetsToLL_M-50_TuneCP5_13p6TeV-madgraphMLM-pythia8/MINIAODSIM/ALCARECO_forPOG_130X_mcRun3_2022_realistic_v5_ext1-v2/2530000/3f52a439-eb16-46f8-b9ab-0b1dd30da09a.root
I looked a bit more at the "bestTrack" (when it exists), and they look really bizarre to me (just looking at the handful of events i mentioned above, i can look at more if people are curious):
It does indeed look related to the other issue in the sense that all kinematic values are zero. Would the fix also address this? Are there samples available with the fix applied? (e.g. would we expect to not see this issue in 2023 or 2024 MC)
can you check in 2023 or 2024 MC if the issue is still present or already addressed by Kenneth's fix?
It does indeed look related to the other issue in the sense that all kinematic values are zero. Would the fix also address this? Are there samples available with the fix applied? (e.g. would we expect to not see this issue in 2023 or 2024 MC)
can you check in 2023 or 2024 MC if the issue is still present or already addressed by Kenneth's fix?
The issue still appears to be there in 2023 and 2024 MC. The two samples I checked are /DYto2TautoMuTauh_M-50_TuneCP5_13p6TeV_madgraphMLM-pythia8/Run3Winter24NanoAOD-133X_mcRun3_2024_realistic_v9-v2/NANOAODSIM and /TTto2L2Nu-2Jets_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer23NanoAODv12-130X_mcRun3_2023_realistic_v14-v2/NANOAODSIM - they seemed to be the most recent DY/TT samples I could find but let me know if there are any more recent samples I should check.
I looked a bit more at the "bestTrack" (when it exists), and they look really bizarre to me (just looking at the handful of events i mentioned above, i can look at more if people are curious):
* chi2 is 0, ndof in the 5 to 10 range, and they are high purity (so they go to PF) * pT is around 1 GeV, |eta| around 2.7 * pT error is inf * vx/vy and dxy are finite, vz and dz are inf
Do I understand correctly that these tracks are present in the general-track collections already and being propagated to PF? Let me also tag @cms-sw/tracking-pog-l2 as it can be related to track reconstruction.
I looked a bit more at the "bestTrack" (when it exists), and they look really bizarre to me (just looking at the handful of events i mentioned above, i can look at more if people are curious):
* chi2 is 0, ndof in the 5 to 10 range, and they are high purity (so they go to PF) * pT is around 1 GeV, |eta| around 2.7 * pT error is inf * vx/vy and dxy are finite, vz and dz are inf
Do I understand correctly that these tracks are present in the general-track collections already and being propagated to PF? Let me also tag @cms-sw/tracking-pog-l2 as it can be related to track reconstruction.
To be very clear, I should say that I only looked at MiniAOD. There the "bestTrack" method returns tracks with the features that I mentioned. I would assume that this implies that they should be in the general-tracks collections and be high-purity since they make it into PF, but it's of course conceivable that something else happens. I think the AOD sample that corresponds to the MINIAOD one above is available so it should be possible to get the events and look into the collections there. I'll see if I can easily do it and will report back in the thread.
Turns out that AODSIM isn't available... I actually can't easily find a run 3 sample that has both MiniAODSIM and AODSIM available - if someone can point me to such a sample, I'm happy to check.
Turns out that AODSIM isn't available... I actually can't easily find a run 3 sample that has both MiniAODSIM and AODSIM available - if someone can point me to such a sample, I'm happy to check.
@cms-sw/pdmv-l2 can you help finding a AODSIM for run3 for checking the above mentioned issue for which we need to access general track collection?
And @steggema just out of curiosity, we expect this issue to also show up in Run3 data, right? If any example of data event is found, kindly paste the event pointer here.
Here are the AODSIM for the above two samples /DYto2TautoMuTauh_M-50_TuneCP5_13p6TeV_madgraphMLM-pythia8/Run3Winter24Reco-133X_mcRun3_2024_realistic_v9-v2/AODSIM /TTto2L2Nu-2Jets_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer23DRPremix-130X_mcRun3_2023_realistic_v14-v2/AODSIM
In any case, AODSIM should be available by default.
Thanks, the DY->tau tau-> mu tauh sample indeed has AODSIM available on disk (the other doesn't), so I was able to run a quick check with it.
What I did is:
My main finding is that these tracks have finite dz, vz, and ptError values (whereas they are nan, nan, and inf for the bestTracks). I print some more information below (*) if it helps. I would conclude that it means that something is going wrong in how these tracks are being used within the PF algorithm. Please let me know if there's anything else that I can do to help.
(*) ######## Event 6 pt 1.00 eta -2.84 phi 0.53 dz 0.49 dxy -0.04 found 5 quality 2 True chi2 4.69 ndof 5.0 pt error 0.82 vz 0.22 algo duplicateMerge ######## Event 24 pt 1.09 eta -2.89 phi -2.33 dz 5.62 dxy 0.09 found 5 quality 2 True chi2 4.93 ndof 5.0 pt error 0.99 vz 5.79 algo highPtTripletStep ######## Event 72 pt 0.98 eta -2.66 phi 0.61 dz -1.64 dxy -0.04 found 5 quality 2 True chi2 2.88 ndof 5.0 pt error 0.94 vz -1.84 algo highPtTripletStep ######## Event 101 pt 1.02 eta -2.64 phi 0.90 dz -0.30 dxy 0.02 found 5 quality 2 True chi2 5.39 ndof 5.0 pt error 0.99 vz -0.40 algo highPtTripletStep ######## Event 103 pt 1.01 eta 2.64 phi -2.21 dz -1.46 dxy 0.07 found 5 quality 2 True chi2 7.45 ndof 5.0 pt error 0.85 vz -1.55 algo highPtTripletStep ######## Event 125 pt 1.16 eta 2.84 phi -0.23 dz -1.96 dxy -0.01 found 5 quality 2 True chi2 3.28 ndof 5.0 pt error 1.00 vz -1.55 algo highPtTripletStep ######## Event 136 pt 0.98 eta -2.73 phi -0.58 dz 0.75 dxy -0.01 found 5 quality 2 True chi2 3.43 ndof 5.0 pt error 0.98 vz 0.38 algo highPtTripletStep ######## Event 152 pt 0.97 eta -2.73 phi 0.07 dz 0.51 dxy -0.08 found 5 quality 2 True chi2 9.79 ndof 5.0 pt error 0.91 vz 0.17 algo highPtTripletStep ######## Event 164 pt 1.01 eta -2.71 phi -0.34 dz -1.15 dxy -0.01 found 5 quality 2 True chi2 3.26 ndof 5.0 pt error 0.99 vz -1.51 algo highPtTripletStep ######## Event 181 pt 0.99 eta 2.85 phi 2.04 dz -3.25 dxy 0.01 found 5 quality 2 True chi2 8.43 ndof 5.0 pt error 0.97 vz -3.56 algo duplicateMerge ######## Event 195 pt 0.97 eta 2.71 phi -2.37 dz -4.02 dxy 0.03 found 5 quality 2 True chi2 1.70 ndof 5.0 pt error 0.84 vz -4.18 algo highPtTripletStep
Sorry for the multiple posts, but I realised I could of course also simply look at the PFCandidates in AODSIM. Interestingly, they have finite dzError and finite dz of the bestTrack, but they already have all kinematic values at 0. I'm dumping some more information just below in case it helps for the first 20 or so candidates.
######## Event 0 Found PF candidate with zero pt: pt=0.00 best track pt=0.62 eta=0.00 phi=0.00 dz=3.37 dzError=0.24 dxy=-0.01 dxyError=0.17 pdgId=-211 vz=2.98 algo=duplicateMerge ######## Event 1 Found PF candidate with zero pt: pt=0.00 best track pt=0.57 eta=0.00 phi=0.00 dz=0.70 dzError=0.28 dxy=0.09 dxyError=0.21 pdgId=211 vz=0.99 algo=highPtTripletStep ######## Event 2 Found PF candidate with zero pt: pt=0.00 best track pt=0.59 eta=0.00 phi=0.00 dz=2.28 dzError=0.20 dxy=0.03 dxyError=0.15 pdgId=-211 vz=1.86 algo=highPtTripletStep ######## Event 3 Found PF candidate with zero pt: pt=0.00 best track pt=0.55 eta=0.00 phi=0.00 dz=9.69 dzError=1.05 dxy=0.11 dxyError=0.23 pdgId=-211 vz=9.91 algo=highPtTripletStep ######## Event 4 Found PF candidate with zero pt: pt=0.00 best track pt=0.58 eta=0.00 phi=0.00 dz=-2.69 dzError=0.15 dxy=-0.00 dxyError=0.24 pdgId=-211 vz=-2.35 algo=highPtTripletStep ######## Event 5 Found PF candidate with zero pt: pt=0.00 best track pt=0.85 eta=0.00 phi=0.00 dz=-1.44 dzError=0.10 dxy=0.00 dxyError=0.11 pdgId=-211 vz=-1.47 algo=highPtTripletStep ######## Event 6 Found PF candidate with zero pt: pt=0.00 best track pt=1.00 eta=0.00 phi=0.00 dz=0.49 dzError=2.15 dxy=-0.04 dxyError=0.28 pdgId=211 vz=0.22 algo=duplicateMerge ######## Event 7 Found PF candidate with zero pt: pt=0.00 best track pt=0.91 eta=0.00 phi=0.00 dz=-0.06 dzError=0.17 dxy=-0.03 dxyError=0.10 pdgId=-211 vz=-0.39 algo=highPtTripletStep ######## Event 8 Found PF candidate with zero pt: pt=0.00 best track pt=0.63 eta=0.00 phi=0.00 dz=5.82 dzError=0.15 dxy=-0.01 dxyError=0.17 pdgId=-211 vz=5.47 algo=highPtTripletStep ######## Event 9 Found PF candidate with zero pt: pt=0.00 best track pt=0.74 eta=0.00 phi=0.00 dz=0.71 dzError=1.37 dxy=0.01 dxyError=0.22 pdgId=-211 vz=0.97 algo=highPtTripletStep ######## Event 10 Found PF candidate with zero pt: pt=0.00 best track pt=0.80 eta=0.00 phi=0.00 dz=-0.14 dzError=0.49 dxy=-0.01 dxyError=0.16 pdgId=-211 vz=-0.10 algo=highPtTripletStep ######## Event 11 Found PF candidate with zero pt: pt=0.00 best track pt=0.45 eta=0.00 phi=0.00 dz=7.64 dzError=1.37 dxy=-0.03 dxyError=0.34 pdgId=-211 vz=7.80 algo=duplicateMerge ######## Event 12 Found PF candidate with zero pt: pt=0.00 best track pt=0.70 eta=0.00 phi=0.00 dz=-2.26 dzError=1.74 dxy=0.02 dxyError=0.22 pdgId=-211 vz=-2.53 algo=duplicateMerge ######## Event 13 Found PF candidate with zero pt: pt=0.00 best track pt=0.89 eta=0.00 phi=0.00 dz=-1.81 dzError=1.07 dxy=0.02 dxyError=0.17 pdgId=211 vz=-2.00 algo=highPtTripletStep ######## Event 14 Found PF candidate with zero pt: pt=0.00 best track pt=0.44 eta=0.00 phi=0.00 dz=-1.45 dzError=0.37 dxy=0.08 dxyError=0.45 pdgId=-211 vz=-1.47 algo=highPtTripletStep ######## Event 15 Found PF candidate with zero pt: pt=0.00 best track pt=0.78 eta=0.00 phi=0.00 dz=1.16 dzError=0.34 dxy=-0.05 dxyError=0.11 pdgId=-211 vz=1.55 algo=highPtTripletStep ######## Event 16 Found PF candidate with zero pt: pt=0.00 best track pt=0.71 eta=0.00 phi=0.00 dz=-2.75 dzError=0.15 dxy=0.02 dxyError=0.17 pdgId=211 vz=-2.48 algo=highPtTripletStep ######## Event 17 Found PF candidate with zero pt: pt=0.00 best track pt=0.85 eta=0.00 phi=0.00 dz=-2.15 dzError=1.41 dxy=-0.03 dxyError=0.18 pdgId=-211 vz=-2.35 algo=duplicateMerge ######## Event 18 Found PF candidate with zero pt: pt=0.00 best track pt=0.55 eta=0.00 phi=0.00 dz=-1.17 dzError=0.92 dxy=0.10 dxyError=0.21 pdgId=-211 vz=-0.85 algo=highPtTripletStep ######## Event 19 Found PF candidate with zero pt: pt=0.00 best track pt=0.53 eta=0.00 phi=0.00 dz=3.07 dzError=0.89 dxy=0.01 dxyError=0.23 pdgId=211 vz=3.07 algo=highPtTripletStep ######## Event 20 Found PF candidate with zero pt: pt=0.00 best track pt=0.79 eta=0.00 phi=0.00 dz=-8.82 dzError=0.47 dxy=0.05 dxyError=0.13 pdgId=-211
thanks, interesting.. looks like something went wrong in AOD->miniAOD step
Agreed. For what it's worth, it seems plausible to me that some of the packed candidate creation or methods rely on momenta not being zero, so maybe the right question to ask is what is the purpose of having a PF candidate with zero momentum in the first place...
We had this discussion in the past. In principle zero momentum means that the kinematics aren't reliably estimated to be different than zero within uncertainties. But they can still play a role in candidate counting etc. I wouldn't say they are super useful, but it also wasn't obvious to me that we should kill them. Regardless we should avoid that zeros turn to nans, there's no world in which that is desirable.
FWIW, at least these lines in packVtx()
https://github.com/cms-sw/cmssw/blob/d376eb350a3e9af9f0ce3618fbac6afc36ce56d8/DataFormats/PatCandidates/src/PackedCandidate.cc#L42-L43
will result NaN with zero momentum track.
If I recall correctly, I think the track properties are not stored in the packed PF candidate produced by the PAT packed candidate producer if the pt < 0.95 GeV
This was lowered to 0.0
for Run3 MINIAOD (with low quality covariance matrix).
This was lowered to
0.0
for Run3 MINIAOD (with low quality covariance matrix).
Ah yes, but only if it strictly larger than 0 GeV (so candidates with pT = 0 would then have no track properties)
FWIW, at least these lines in
packVtx()
will result NaN with zero momentum track.
thanks Matti
this[1] change seems to help;
Examples from /eos/cms//store/mc/Run3Winter24Reco/DYto2TautoMuTauh_M-50_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/133X_mcRun3_2024_realistic_v9-v2/2830000/00502d01-cccd-453a-9f6d-160572a6b2ae.root
after running AOD->miniAOD step for 200 evets; and looping over packedPFCandidates
:
Before change [1]
dz, dxy, eta, pt nan -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt nan -0.014941406436264515 0.0 0.0
after change [1]
dz, dxy, eta, pt 5.226568222045898 -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt 5.175044059753418 -0.014941406436264515 0.0 0.0
[1]
--- a/DataFormats/PatCandidates/src/PackedCandidate.cc
+++ b/DataFormats/PatCandidates/src/PackedCandidate.cc
@@ -40,7 +40,9 @@ void pat::PackedCandidate::packVtx(bool unpackAfterwards) {
// float dl = dxPV * c + dyPV * s;
// float xRec = - dxy_ * s + dl * c, yRec = dxy_ * c + dl * s;
float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
- dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+ if (!std::isnan(pzpt)) {
+ dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+ }
--- a/DataFormats/PatCandidates/src/PackedCandidate.cc +++ b/DataFormats/PatCandidates/src/PackedCandidate.cc @@ -40,7 +40,9 @@ void pat::PackedCandidate::packVtx(bool unpackAfterwards) { // float dl = dxPV * c + dyPV * s; // float xRec = - dxy_ * s + dl * c, yRec = dxy_ * c + dl * s; float pzpt = p4_.load()->Pz() / p4_.load()->Pt(); - dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt; + if (!std::isnan(pzpt)) { + dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt; + }
I'd suggest to check e.g. p4_.load()->Pt() != 0.f
(or also Pz() != 0.f
depending if Inf is acceptable or not) directly instead of allowing NaN to be computed. On a quick look I also didn't see any other place that would initialize dz_
, so I'd suggest to add explicit initialization somewhere.
okay, how about this? with this change, the nan issue is solved, some before/after numbers are pasted below,
- float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
- dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+ dz_ = 0;
+ if (p4_.load()->Pt() != 0.f) {
+ float pzpt = p4_.load()->Pz() / p4_.load()->Pt();
+ dz_ = vertex_.load()->Z() - pv.Z() - (dxPV * c + dyPV * s) * pzpt;
+ }
before the change
dz, dxy, eta, pt nan -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt nan -0.014941406436264515 0.0 0.0
dz, dxy, eta, pt nan 0.08945312350988388 0.0 0.0
dz, dxy, eta, pt nan -0.0037353516090661287 0.0 0.0
dz, dxy, eta, pt nan -0.022695312276482582 0.0 0.0
dz, dxy, eta, pt nan 0.0036865235306322575 0.0 0.0
dz, dxy, eta, pt nan 0.009384765289723873 0.0 0.0
dz, dxy, eta, pt nan 0.07921875268220901 0.0 0.0
dz, dxy, eta, pt nan -0.0006768798921257257 0.0 0.0
after the change
dz, dxy, eta, pt 5.226568222045898 -0.06972656399011612 0.0 0.0
dz, dxy, eta, pt 5.175044059753418 -0.014941406436264515 0.0 0.0
dz, dxy, eta, pt 2.9320716857910156 0.08945312350988388 0.0 0.0
dz, dxy, eta, pt 12.334485054016113 -0.0037353516090661287 0.0 0.0
dz, dxy, eta, pt 16.324251174926758 -0.022695312276482582 0.0 0.0
dz, dxy, eta, pt -0.19352594017982483 0.0036865235306322575 0.0 0.0
dz, dxy, eta, pt 3.242164134979248 0.009384765289723873 0.0 0.0
dz, dxy, eta, pt 1.036512017250061 0.07921875268220901 0.0 0.0
dz, dxy, eta, pt 3.9245831966400146 -0.0006768798921257257 0.0 0.0
okay, how about this?
Looks reasonable to me, thanks.
okay, I'm doing some more sanity checks, then I'll open a PR
so its merged in 14_1 https://github.com/cms-sw/cmssw/pull/45009 do we need any backport? I guess a backport to 14_0 can be useful as that's the release for 2024 data-taking.
A backport to 14_0 sounds great to me. This should also only make it necessary to backport the DeepMET input check for possible ReNanoAOD campaigns.
+1 Solved by https://github.com/cms-sw/cmssw/pull/45009
This issue is fully signed and ready to be closed.
Hi @swagata87 , all, something that is a little confusing in this comment https://github.com/cms-sw/cmssw/issues/44976#issuecomment-2119272110 is that you show dz=nan
for particle with eta=0 pt=0
(according to the printout) which in turn should have dz=0
after the fix. What are we reading wrong ?
It looks like there are multiple places where dz
gets updated after it is first calculated in the part of the code that is changed in the PR. Previously, when dz=NaN
was calculated at first, it remained NaN
after the subsequent updates. But now, we are getting dz=0
at first, in subsequent steps it gets changed to some other value.
Checking the DeepMET performance in run 3, we found a significant fraction of events for which DeepMET evaluates to NAN, up to 1% in a four-top sample. We've just traced this down to events where we have a PF candidate where dz() returns NAN, which we don't catch https://github.com/cms-sw/cmssw/blob/master/RecoMET/METPUSubtraction/plugins/DeepMETProducer.cc#L85
The fix will be easy enough (just checking for NAN), but I was wondering if dz() returning NAN for a PF candidate is expected or if it could impact something else.
Tagging co-developers (@yongbinfeng @mseidel42) and whoever i could easily find on github from JME (@alkaloge) PF (@swagata87) Reco (@mandrenguyen)