Closed emanueleusai closed 2 years ago
A new Issue was created by @emanueleusai Emanuele Usai.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign dqm,egamma,tracking
New categories assigned: dqm
@jfernan2,@ahmad3213,@micsucmed,@rvenditti,@emanueleusai,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks
assign reconstruction, tracking-pog, egamma-pog
New categories assigned: tracking-pog,reconstruction,egamma-pog
@jpata,@slava77,@mmusich,@lfinco,@clacaputo,@swagata87,@vmariani you have been requested to review this Pull request/Issue and eventually sign? Thanks
unassign dqm
for the record the exception is triggered here: https://github.com/cms-sw/cmssw/blob/2a2ca2ee216a44c2743301dc2dc32b4c7c93f4ad/TrackingTools/GsfTools/src/BasicMultiTrajectoryState.cc#L19-L21
@emanueleusai @tyjyang (ORM) have you tried to reproduce? Can you provide the first event that fails for ease of debugging?
have you tried to reproduce? Can you provide the first event that fails for ease of debugging?
answering to self:
import FWCore.ParameterSet.Config as cms
from PSet import process
# skip to failing event
process.source.eventsToProcess = cms.untracked.VEventRange('356948:59404076-356948:59404076')
process.options.numberOfThreads = 1
will fail at the first event.
Hi, this has(is) also being discussed here
so, for starters I reconstructed the rather involved chain of calls.
The exception originates here in the constructor of BasicMultiTrajectoryState
The constructor of BasicMultiTrajectoryState
is called here by MultiTrajectoryStateAssembler::combinedState()
which in turn is called by GsfMultiStateUpdator::update(...)
which is called by GsfTrajectoryFitter::fitOne(...)
:
which in turn is called by TrackProducerAlgorithm<reco::GsfTrack>::buildTrack(...)
which is called by TrackProducerAlgorithm<T>::runWithCandidate(...)
which finally is called by GsfTrackProducer::produce(..)
which is where the exception gets caught.
with this patch:
diff --git a/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc b/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
index 4b5010d3414..c3958e7f0c4 100644
--- a/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
+++ b/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
@@ -30,7 +30,7 @@ TrajectoryStateOnSurface GsfMultiStateUpdator::update(const TrajectoryStateOnSur
int i = 0;
for (auto const& tsosI : predictedComponents) {
TrajectoryStateOnSurface updatedTSOS = KFUpdator().update(tsosI, aRecHit);
- if (updatedTSOS.isValid()) {
+ if (updatedTSOS.isValid() && updatedTSOS.localError().valid()) {
result.addState(TrajectoryStateOnSurface(weights[i],
updatedTSOS.localParameters(),
updatedTSOS.localError(),
the event Event run: 356948 lumi: 46 event: 59404076
, does get reconstructed, albeit with quite a few warnings and an ominous error message:
%MSG
%MSG-e PosteriorWeightsCalculator: GsfTrackProducer:lowPtGsfEleGsfTracks 12-Aug-2022 13:45:19 CEST Run: 356948 Event: 59404076
PosteriorWeightsCalculator: sumWeight < DBL_MIN
%MSG
%MSG-e GsfMultiStateUpdator: GsfTrackProducer:lowPtGsfEleGsfTracks 12-Aug-2022 13:45:19 CEST Run: 356948 Event: 59404076
no weights could be retreived. invalid updated state !.
%MSG
%MSG-e InvalidState: GsfTrackProducer:lowPtGsfEleGsfTracks 12-Aug-2022 13:45:19 CEST Run: 356948 Event: 59404076
inside hit
%MSG
I'd appreciate feedback from any electron reconstruction expert (if any still exists).
Thank @mmusich. Unfortunately, I'm not an expert of the GSF tracking code ... @lsoffi @swagata87 @VinInn ?
I have seen those 3 error message appearing together several times while running electron reco at HLT level recently. I suspect (purely observational though) that those 3 messages are related. If for some reason sumWeight < DBL_MIN
, ie, if sumWeight is basically 0, then an empty weight vector is returned:
https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/TrackingTools/GsfTracking/src/PosteriorWeightsCalculator.cc#L112-L115
which leads to the 2nd error message, no weights could be retreived. invalid updated state !
, because weights are empty,
https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc#L22-L25. The 3rd one is not so obviously related to the 1st/2nd, but I've seen it appearing together with them, so I assume they are somehow related:
https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc#L75-L77
Before saying that the errors are harmless, I would like to do some sanity checks. In events where these errors come up, I will check the compatibility of electron supercluster Et/Eta/Phi and electron GSF Pt/Eta/Phi. I will get back to you here. If things look reasonable in the test, we can go ahead with Marco's solution.
I checked only one problematic event, the one that Marco pointed out above (356948:59404076).
Here are my findings:
In this event, there is one reconstructed supercluster
in endcap, with energy= 33.4 GeV, Eta=-2.6, Phi=-1.8, and Et=4.9 GeV.
And there are 2 reconstructed GSF tracks:
GSF1
Pt=0.978 GeV, Eta=-2.52, Phi=-1.86
GSF2
Pt=0.878 GeV, Eta=-2.57, Phi=-2.03
Pt of the supercluster
does not match with any of the GSF, but Eta/Phi sort of matches.
There is no reconstructed electron in the event. So, it's hard to judge just from this one event. The supercluster could be due to a jet, and not a real electron.
Does anyone know if there is any other event with same issue? Maybe that can be checked.
Does anyone know if there is any other event with same issue? Maybe that can be checked.
@swagata87 if you follow the link to the cmsTalk post provided by the OP, you will see that there is a second instance of the same problem in run 356947, tarball at:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2022C/DQMLogicError/vocms013.cern.ch-1700050-3-log.tar.gz
while the e/gamma experts take a second look, I've opened https://github.com/cms-sw/cmssw/pull/39056 in order to start giving it some wider exposure.
I checked only one problematic event, the one that Marco pointed out above (356948:59404076). Here are my findings: In this event, there is one reconstructed
supercluster
in endcap, with energy= 33.4 GeV, Eta=-2.6, Phi=-1.8, and Et=4.9 GeV. And there are 2 reconstructed GSF tracks:GSF1
Pt=0.978 GeV, Eta=-2.52, Phi=-1.86GSF2
Pt=0.878 GeV, Eta=-2.57, Phi=-2.03Pt of the
supercluster
does not match with any of the GSF, but Eta/Phi sort of matches.There is no reconstructed electron in the event. So, it's hard to judge just from this one event. The supercluster could be due to a jet, and not a real electron.
Does anyone know if there is any other event with same issue? Maybe that can be checked.
Hi @swagata87, this is entirely possible with the low-pT algorithm. The chain is tracker-driven and GSF tracks are used to seed a search for a SC in ECAL. The SC algo is "greedy" in the sense that the GSF track is always dR-matched to the first (and nearest) ECAL cluster found, i.e. there is no upper bound in dR, while all subsequent clusters must be compatible with the GSF track trajectory according to requirements in dEta and dPhi. (Fake electrons with "mismatched" tracks/SCs are later dealt with via the ID.)
I'm not sure how this interplays with the exception, as it is thrown at the "GSF tracking" stage, before any match to a SC is attempted ... (Ok, technically, the preceding "seeding" step does a "lightweight" GSF tracking and attempts to match to an ECAL / HCAL cluster...)
the other event (356947:413408861 happening in JetMET dataset) can't be checked because the RAW file is not there in the location anymore.
Input file root://eoscms.cern.ch//eos/cms/tier0/store/data/Run2022C/JetMET/RAW/v1/000/356/947/00000/e7e7fd8c-d2ed-47c0-a46f-978790000957.root?eos.app=cmst0 could not be opened.
in that case, I'd say let's proceed with the fix. From e/gamma side, we will closely check electron reco efficiency in release validation, after this fix is merged, just to make sure that things are fine.
thank you @mmusich and @bainbrid
can't be checked because the RAW file is not there in the location anymore.
indeed, but it's surprising. What's the current policy to keep RAW data on disk at Tier-0 @germanfgv ?
@mmusich all data we produce stays at T0 for a minimum of 7 days. Then, if it has been transferred to the desired locations and is not being used by us anymore, it gets deleted.
As we failed the paused jobs last Friday, these files were deleted from T0 disk.
+1
[If it is not okay to write in an closed issue, then I apologise, and can create a new issue. I am writing here as this is very much related]
It looks like the same issue came back (only once, and at HLT level) in a collision run, despite the fix. It is clear that the crash is extremely rare, but it is surprising that it came back despite the fix was merged.
Run number: 359686
StartTime: Oct 1 2022, 01:11
EndTime: Oct 1 2022, 07:19
Error message:
[2] Calling method for module GsfTrackProducer/'hltEgammaGsfTracks'
Exception Message:
MultiTrajectoryState mixes states with and without errors
HLT Menu: /cdaq/physics/Run2022/2e34/v1.4.0/HLT/V5
Relevant elog: http://cmsonline.cern.ch/cms-elog/1157779
We can follow up once the error stream file is available.
If it is not okay to write in an closed issue, then I apologise, and can create a new issue. I am writing here as this is very much related
I'd suggest to open a new issue. A note made here is useful still.
Hello
We have two instances of PromptReco jobs for dataset ParkingDoubleMuonLowMass5 crashing when running dqm offline:
cmsRun1 Fatal Exception (Exit Code: 8001) An exception of category 'LogicError' occurred while [0] Processing Event run: 356948 lumi: 46 event: 59404076 stream: 5 [1] Running path 'dqmoffline_6_step' [2] Prefetching for module LogMessageMonitor/'TrackFinderLogMessageMonCommon' [3] Prefetching for module LogErrorHarvester/'logErrorHarvester' [4] Prefetching for module ConversionProducer/'gsfTracksOpenConversions' [5] Prefetching for module ConversionTrackProducer/'gsfTracksOpenConversionTrackProducer' [6] Calling method for module GsfTrackProducer/'lowPtGsfEleGsfTracks' Exception Message: MultiTrajectoryState mixes states with and without errors
Logs: /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2022C/DQMLogicError/vocms013.cern.ch-17054
This seems to be related to GsfTrackProducer/'lowPtGsfEleGsfTracks'. Can @cms-sw/egamma-pog-l2 @cms-sw/tracking-pog-l2 @nancymarinelli @sikler please look into it?