Closed Dr15Jones closed 1 month ago
cms-bot internal usage
A new Issue was created by @Dr15Jones.
@antoniovilela, @sextonkennedy, @smuzaffar, @makortel, @rappoccio, @Dr15Jones can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
The job can be run by setting up a CMSSW_14_0_7 area, downloading the tarball (which is at /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FileReadError/a406cf00-00a4-498e-b7e2-9ec39b964fac-216-3-logArchive.tar.gz )
Then after untarring go to directory job/WMTaskSpace/cmsRun1 and then do
cmsRun PSet.py
There appear to be lots of extraneous exceptions being thrown (and caught) in this job. The first one encountered is
%MSG-e SiStripMonitorTrack: SiStripMonitorTrack:HLTSiStripMonitorTrack 06-Jun-2024 17:43:09 CEST Run: 381379 Event: 1741662696
ClusterCollection is not valid!!
%MSG
[Switching to Thread 0x7fffa05fe640 (LWP 3001818)]
Thread 7 "cmsRun" hit Catchpoint 1 (exception thrown), 0x00007ffff5ead0f1 in __cxxabiv1::__cxa_throw (obj=0x7ffde5f68b00, tinfo=0x7ffff79a0650 <typeinfo for edm::Exception>,
dest=0x7ffff796a010 <edm::Exception::~Exception()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
81 ../../../../libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory.
(gdb) where
#0 0x00007ffff5ead0f1 in __cxxabiv1::__cxa_throw (obj=0x7ffde5f68b00, tinfo=0x7ffff79a0650 <typeinfo for edm::Exception>, dest=0x7ffff796a010 <edm::Exception::~Exception()>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1 0x00007ffff7b7e0b2 in throwInvalidRefFromNullOrInvalidRef(edm::TypeID const&) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libDataFormatsCommon.so
#2 0x00007ffff7b7ed6f in edm::RefCore::tryToGetProductPtr(std::type_info const&, edm::EDProductGetter const*) const [clone .cold] ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libDataFormatsCommon.so
#3 0x00007fffa557aa1a in reco::Track::recHitsBegin() const ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/pluginRecoTrackerFinalTrackSelectorsPlugins.so
#4 0x00007fffa55bd779 in SingleLongTrackProducer::produce(edm::Event&, edm::EventSetup const&) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/pluginRecoTrackerFinalTrackSelectorsPlugins.so
#5 0x00007ffff7e483c1 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#6 0x00007ffff7e2c04e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#7 0x00007ffff7db9159 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#8 0x00007ffff7db96c4 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreFramework.so
#9 0x00007ffff7f3af28 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el9_amd64_gcc12/libFWCoreConcurrency.so
#10 0x00007ffff6f1091b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7ffeafe74400, waiter=..., this=0x7ffff41c3b00)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7ffff41c3b00)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#12 tbb::detail::r1::arena::process (tls=..., this=<optimized out>)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/arena.cpp:137
#13 tbb::detail::r1::market::process (this=<optimized out>, j=...)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/market.cpp:599
#14 0x00007ffff6f12ace in tbb::detail::r1::rml::private_worker::run (this=0x7ffff2486f00)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#15 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7ffff2486f00)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_3-el9_amd64_gcc12/build/CMSSW_14_0_3-build/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-d33db04d4520c6ff791eab900054e986/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#16 0x00007ffff5a89c02 in start_thread () from /lib64/libc.so.6
#17 0x00007ffff5b0ec40 in clone3 () from /lib64/libc.so.6
Which is caught here https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L158-L173
which is problematic as the tracks are the generalTracks
which are being made in this job and SHOULD have accessible hits!
assign tracking
The next group of exceptions come from
#0 0x00007ffff5b9d2f1 in __cxxabiv1::__cxa_throw (obj=0x7ffdca082400, tinfo=0x7ffff79a5628 <typeinfo for cms::Exception>, dest=0x7ffff796ee30 <cms::Exception::~Exception()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1 0x00007fffc37f8a8d in PerigeeConversions::ftsToPerigeeParameters(FreeTrajectoryState const&, Point3DBase<float, GlobalTag> const&, double&) [clone .cold] ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackingToolsTrajectoryState.so
#2 0x00007fffc3806a5a in TrajectoryStateClosestToPoint::TrajectoryStateClosestToPoint(FreeTrajectoryState const&, Point3DBase<float, GlobalTag> const&) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackingToolsTrajectoryState.so
#3 0x00007fffc38725a5 in TSCPBuilderNoMaterial::operator()(TrajectoryStateOnSurface const&, Point3DBase<float, GlobalTag> const&) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackingToolsPatternTools.so
#4 0x00007fffbe679dd2 in PerigeeLinearizedTrackState::computeJacobians() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexVertexTools.so
#5 0x00007fffbe67a456 in PerigeeLinearizedTrackState::isValid() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexVertexTools.so
#6 0x00007fffbc5ac58f in KalmanVertexUpdator<5u>::positionUpdate(VertexState const&, ReferenceCountingPointer<LinearizedTrackState<5u> >, float, int) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#7 0x00007fffbc5ae20d in KalmanVertexUpdator<5u>::update(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >, float, int) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#8 0x00007fffbc5ae89a in KalmanVertexUpdator<5u>::add(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#9 0x00007fffbc5ae90d in KalmanVertexTrackCompatibilityEstimator<5u>::estimateNFittedTrack(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#10 0x00007fffbc5b023f in KalmanVertexTrackCompatibilityEstimator<5u>::estimate(CachingVertex<5u> const&, ReferenceCountingPointer<VertexTrack<5u> >, unsigned int) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#11 0x00007fffbc5aa80e in KalmanVertexTrackCompatibilityEstimator<5u>::estimate(CachingVertex<5u> const&, ReferenceCountingPointer<LinearizedTrackState<5u> >, unsigned int) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexKalmanVertexFit.so
#12 0x00007fffbc5d101c in AdaptiveVertexFitter::reWeightTracks(std::vector<ReferenceCountingPointer<LinearizedTrackState<5u> >, std::allocator<ReferenceCountingPointer<LinearizedTrackState<5u> > > > const&, CachingVertex<5u> const&) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#13 0x00007fffbc5d1e65 in AdaptiveVertexFitter::reWeightTracks(std::vector<ReferenceCountingPointer<VertexTrack<5u> >, std::allocator<ReferenceCountingPointer<VertexTrack<5u> > > > const&, CachingVertex<5u> const&) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#14 0x00007fffbc5d32ed in AdaptiveVertexFitter::fit(std::vector<ReferenceCountingPointer<VertexTrack<5u> >, std::allocator<ReferenceCountingPointer<VertexTrack<5u> > > > const&, VertexState const&, bool) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#15 0x00007fffbc5d46e1 in AdaptiveVertexFitter::vertex(std::vector<reco::TransientTrack, std::allocator<reco::TransientTrack> > const&, Point3DBase<float, GlobalTag> const&) const ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoVertexAdaptiveVertexFit.so
#16 0x00007fff4035710a in TemplatedInclusiveVertexFinder<edm::View<reco::Candidate>, reco::VertexCompositePtrCandidate>::produce(edm::Event&, edm::EventSetup const&) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginRecoVertexAdaptiveVertexFinderPlugins.so
#17 0x00007ffff7ce1e91 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
the exception originates here
and is caught here
assign reconstruction
New categories assigned: reconstruction
@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks
By skipping the first events, I was able to get to the trackback for the exception which ultimately ended the job
#0 0x00007ffff5b9d2f1 in __cxxabiv1::__cxa_throw (obj=0x7ffe9579d1a0, tinfo=0x7ffff5d03190 <typeinfo for std::length_error>, dest=0x7ffff5bb2220 <std::length_error::~length_error()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1 0x00007ffff5b942d9 in std::__throw_length_error(char const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib64/libstdc++.so.6
#2 0x00007fffc38c8346 in ROOT::Detail::TCollectionProxyInfo::Pushback<std::vector<unsigned char, std::allocator<unsigned char> > >::resize(void*, unsigned long) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libDataFormatsStdDictionaries.so
#3 0x00007ffff7193701 in void TGenCollectionStreamer::ReadBufferVectorPrimitives<unsigned char>(TBuffer&, void*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#4 0x00007ffff7110e09 in TBufferFile::ReadFastArray(void*, TClass const*, int, TMemberStreamer*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#5 0x00007ffff735e073 in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#6 0x00007ffff7211e4c in TStreamerInfoActions::VectorLooper::GenericRead(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#7 0x00007ffff710f5fc in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#8 0x00007ffff725f38f in int TStreamerInfoActions::ReadSTL<&TStreamerInfoActions::ReadSTLMemberWiseSameClass, &TStreamerInfoActions::ReadSTLObjectWiseFastArray>(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#9 0x00007ffff7117eae in TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#10 0x00007ffff735cdcc in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#11 0x00007ffff71de94d in TStreamerInfoActions::GenericReadAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#12 0x00007ffff710fbb5 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#13 0x00007ffff7873b87 in TBranchElement::ReadLeavesMember(TBuffer&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#14 0x00007ffff786c429 in TBranch::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#15 0x00007ffff787ed44 in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#16 0x00007ffff787ecfd in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libTree.so
#17 0x00007fff9d66585c in edm::RootTree::getEntry(TBranch*, long long) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolInput.so
#18 0x00007fff9d64639c in edm::RootDelayedReader::getProduct_(edm::BranchID const&, edm::EDProductGetter const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolInput.so
#19 0x00007ffff7bc111f in edm::DelayedReader::getProduct(edm::BranchID const&, edm::EDProductGetter const*, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x00007ffff7c6a35b in edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x00007ffff7c6b7cc in edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}::operator()() const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x00007ffff7c6b918 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::DelayedReaderInputProductResolver::prefetchAsync_(edm::WaitingTaskHolder, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#23 0x00007ffff7e031d0 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#24 0x00007ffff63fe95b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fff08c3ec00, waiter=..., this=0x7ffff3963b00)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#25 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7ffff3963b00)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#26 tbb::detail::r1::arena::process (tls=..., this=<optimized out>)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#27 tbb::detail::r1::market::process (this=<optimized out>, j=...)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/market.cpp:599
#28 0x00007ffff6400b0e in tbb::detail::r1::rml::private_worker::run (this=0x7ffff17e9100)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#29 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7ffff17e9100)
at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#30 0x00007ffff55341ca in start_thread () from /lib64/libpthread.so.0
#31 0x00007ffff518f8d3 in clone () from /lib64/libc.so.6
assign root
@pcanal how can we understand better what happened during the read?
type root
type tracking
Which is caught here
that's just looks like a poorly written code, where try/catch is used instead of checking for trackExtra to be present. Tracks are apparently not pure generalTracks, see https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L133-L136
a proper copy is made conditionally, while the rest in selTracks
is going to be default-constructed reco::Tracks
@borzari
please check https://github.com/cms-sw/cmssw/issues/45162#issuecomment-2153549462
to possibly remove the try/catch
pattern related to just acces to track.extra in the track.recHitsBegin()
call.
It should be a combination of validity checks for extra()
and then extra()->recHitsProduct()
; by checking isNonnull() && isAvailable()
for each, sequentially.
This could even be packed into a new helper method ,e.g. bool reco::Track::recHitsOk()
Please clarify if you are available to check this. Thank you.
Hi @slava77
I applied what you suggested in this commit, used the opportunity to remove some duplicated code, and tested it with RelValZMM and RelValTTbar events by comparing the version with try/catch results with the version with the validity check results. Everything worked as intended and no changes to the output were observed, as expected.
Just to clarify two points:
SingleLongTrackProducer
module to check the validity of the track. Thinking out loud about what you suggested, I think you meant that the method could be included in https://github.com/cms-sw/cmssw/blob/master/DataFormats/TrackReco/interface/Track.h. If this is what you meant, I can modify the branch to have the recHitsOk
method there;recHitsProduct()
. There doesn't seem to be something similar to isNonnull()
or isAvailable()
for it. However, just checking track.extra()
seemed enough. Was it supposed to be like this? Am I missing something about the recHitsProduct()
?I couldn't check the validity of the
recHitsProduct()
. There doesn't seem to be something similar toisNonnull()
orisAvailable()
for it. However, just checkingtrack.extra()
seemed enough. Was it supposed to be like this? Am I missing something about therecHitsProduct()
?
I misread the TrackExtraBase; edm::RefCore m_hitCollection;
is the one that has isNonnull()
and isAvailable()
, but it is not publicly exposed.
So, I would add this bool recHitsOk() const {return m_hitCollection.isNonnull() && m_hitCollection.isAvailable();}
in TrackExtraBase.h
And then in Track.h bool recHitsOk() const {return extra_.isNonnull() && extra_.isAvailable() && extra_->recHitsOk();}
Even though in the current setup a track without an extra is enough, there can still be cases where SingleLongTrackProducer
uses input tracks where hits got dropped.
Tracks are apparently not pure generalTracks, see https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L133-L136 a proper copy is made conditionally, while the rest in
selTracks
is going to be default-constructedreco::Tracks
Out of curiosity why is that? Can't the selTracks
just contain the tracks we can actually refit?
Tracks are apparently not pure generalTracks, see https://github.com/cms-sw/cmssw/blob/dbbd44f6792e61b79f46b7f9974eec7cf8e3024b/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc#L133-L136
a proper copy is made conditionally, while the rest in
selTracks
is going to be default-constructedreco::Tracks
Out of curiosity why is that? Can't the
selTracks
just contain the tracks we can actually refit?
Hi @mmusich
The selTracks
collection will only have one track, the one with smallest chiNdof
. I also want to check if the rechits
and hits from the hitpattern
are valid to say that it is a goodTrack
that can be used for the shortened tracks pT resolution. Specially because of what @slava77 mentioned here:
Even though in the current setup a track without an extra is enough, there can still be cases where SingleLongTrackProducer uses input tracks where hits got dropped.
The hit checks are to make sure that this track won't have missing layers with measurement, which is not 100% effective as I already showed during the presentations about this topic, but also doesn't impact a lot on the final result because it doesn't happen so often. I wouldn't think changing that part of the code for selTracks
to only have tracks that can be refitted to have a large impact on what is going on in the SingleLongTrackProducer
or after it, unless it is an extra "safety check" that can be included.
Here I added the suggestions from @slava77. Again, I tested with RelValZMM and RelValTTbar events, and things are working as expected. If you don't have other suggestions, I can open a PR with it and we can continue the discussion there
@borzari
also want to check if the rechits and hits from the hitpattern are valid to say that it is a goodTrack that can be used for the shortened tracks pT resolution.
Exactly, can't you do that before filling the vector? Default constructed tracks can't be used for refit.
@borzari
also want to check if the rechits and hits from the hitpattern are valid to say that it is a goodTrack that can be used for the shortened tracks pT resolution.
Exactly, can't you do that before filling the vector? Default constructed tracks can't be used for refit.
Alright, so instead of only getting the track with the smallest chiNdof
, I also want it to have recHitsOk()
, right?
I also want it to have
recHitsOk()
, right?
Right, this is what I had in mind.
I also want it to have
recHitsOk()
, right?Right, this is what I had in mind.
It didn't work. If I move the validity check from the rechits/hitpattern check to where I select tracks (I did if (chiNdof < fitProb && track.recHitsOk())
), I get the message as if I was not checking the tracks:
----- Begin Fatal Exception 08-Jun-2024 19:16:37 CEST-----------------------
An exception of category 'InvalidReference' occurred while
[0] Processing Event run: 1 lumi: 76 event: 7503 stream: 6
[1] Running path 'dqmoffline_step'
[2] Calling method for module SingleLongTrackProducer/'SingleLongTrackProducer'
Exception Message:
BadRefCore RefCore: Request to resolve a null or invalid reference to a product of type 'std::vector<reco::TrackExtra>' has been detected.
Please modify the calling code to test validity before dereferencing.
----- End Fatal Exception -------------------------------------------------
I get the message as if I was not checking the tracks:
Isn't track.recHitsOk()
checking that the TrackExtra
is valid?
Isn't
track.recHitsOk()
checking that theTrackExtra
is valid?
Should be. I implemented it like Slava suggested here
Could it be that, although I am adding only tracks with valid TrackExtra
to selTracks
, the framework still needs me to check if I am looking at a valid track (that have TrackExtra
) from it to check if it has valid hits/hitpattern? I am not sure how the "not valid TrackExtra
" exception works, that is why I am asking
The check I used was
if (track.extra().isAvailable()) {
The check I used was
if (track.extra().isAvailable()) {
Alright @Dr15Jones, but does it happens every time I am using a reco::Track
anywhere?
Well, in any case, I would suggest to open a PR with these changes. At least to remove the try/catch
pattern.
get the message as if I was not checking the tracks:
maybe I am missing something, but with https://github.com/CMSTrackingPOG/cmssw/commit/53185493eae82d7fe8e807e9b266491ea51d06f8 on top of https://github.com/borzari/cmssw/commit/95ecc4bb4aa7e811f1f65025c8f08a23a72cf272 I can run this test:
(even using the whole input file) without crashes.
@mmusich most probably I was missing something. The main differences I see (besides the better organization of the code in the way you wrote), is that I included track.recHitsOk()
here in the condition to select the best track, and instead of using isNonnull()
here, I would use the bestTrack.recHitsOk()
. Also, and maybe here was my mistake, I removed this condition, which you didn't. That is why I asked @Dr15Jones if the check for the availability for TrackExtra
is done every time a reco::Track
is being used
@mmusich I started from your branch and tested what I mentioned above:
if (bestTrack.extra().isNonnull())
with if (bestTrack.recHitsOk())
: didn't have any effect, as expected, and should be "safer"selTracks
; I would also keep the extra check for safety reasonsMay I start a PR to include your changes and the recHitsOk()
method to CMSSW?
May I start a PR to include your changes and the recHitsOk() method to CMSSW?
here it is: https://github.com/cms-sw/cmssw/pull/45213. I used the CMSTrackingPOG VO so you should be able to push more commits if necessary.
here it is: #45213. I used the CMSTrackingPOG VO so you should be able to push more commits if necessary.
Great! I don't think there are any other modifications that are needed. Just FYI, I also checked the output DQM histograms of that branch using RelValZMM events and they are the same as before the changes, as expected
@Dr15Jones
most of the recent discussion was about the try/catch: this is now fixed in #45213 .
I don't expect though that this would address the underlying cause that lead to this github issue (the promptReco failure).
If I understand correctly the fix in tracking is mostly a convenience for debugging using catch throw
to not be distracted.
Then the actual problem is likely more related to root handling the data.
Is my understanding correct?
@slava77 I'm on vacation until Thursday. The try/catch fixes were only there to make it easier to get to the underlying problem in the debugger. It does look like the underlying problem is in ROOT.
Coming back to problem itself, in https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381379-parkingsinglemuon4/42082/7 the likely cause was mentioned to be a corrupted file. I suppose there were no further similar failures? Under the file corruption hypothesis, maybe we could just close the issue?
+1 Issue seems solved
This issue is fully signed and ready to be closed.
@cmsbuild, please close
From https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381379-parkingsinglemuon4/42082
The tarball can be found here:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FileReadError/job/WMTaskSpace/cmsRun1 From the logs it seems to crash at event 1742503164. The error is reproducible locally.