Closed bsunanda closed 2 weeks ago
cms-bot internal usage
A new Issue was created by @bsunanda.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign geometry, alca, db
New categories assigned: geometry,alca,db
@atpathak,@bsunanda,@civanch,@consuegs,@Dr15Jones,@francescobrivio,@kpedro88,@makortel,@mdhildreth,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks
assign core @makortel @cms-sw/core-l2 , the github issues was opened because @bsunanda cannot run the second cmsRun step as mentioned in the descriptions, because of a high memory issue (as reported by him at today's AlCaDB meeting). This is becoming extremely urgent, because in 10 days from now the ppRef run is expected to start, and an updated ZDC geometry is needed for it. At the meeting we suggested Sunanda to open this issue and ask for help from the O&C core team, trying to speed up the resolution. Would any of you be able to have a look at it, and maybe, given your experience, pinpointing the origin of the problem? That would be of great help: thank you!
New categories assigned: core
@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks
Thanks @perrotta for explaining the background and urgency. We'll try to take a look. It's unfortunate though the time line is so tight.
I tried the instructions, and the first cmsRun job failed with
----- Begin Fatal Exception 08-Oct-2024 07:51:12 CDT-----------------------
An exception of category 'ConfigFileReadError' occurred while
[0] Processing the python configuration file named geometryExtended2024DD4hep_xmlwriter.py
Exception Message:
unknown python problem occurred.
RuntimeError: An exception of category 'FileInPathError' occurred.
Exception Message:
edm::FileInPath unable to find file Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2024FlatPlus10PercentFlatPlus10PercentFlatPlus10Percent.xml anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: [cut ]
and looking at $CMSSW_RELEASE/src/Geometry/CMSCommonData/data/dd4hep/ such a file doesn't exist. However, the following does cmsExtendedGeometry2024FlatPlus10Percent.xml
Hi Chris
Did you start with a given IB of CMSSW? I wonder why FlatPlus10Percent appears 3 times in the name. I shall see what is there in the repository and make corrections. Best regards
Sunanda
From: Chris Jones @.> Sent: 08 October 2024 18:24 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] Creation of Geometry Payloads for DataBase (Issue #46290)
I tried the instructions, and the first cmsRun job failed with
----- Begin Fatal Exception 08-Oct-2024 07:51:12 CDT----------------------- An exception of category 'ConfigFileReadError' occurred while [0] Processing the python configuration file named geometryExtended2024DD4hep_xmlwriter.py Exception Message: unknown python problem occurred. RuntimeError: An exception of category 'FileInPathError' occurred. Exception Message: edm::FileInPath unable to find file Geometry/CMSCommonData/data/dd4hep/cmsExtendedGeometry2024FlatPlus10PercentFlatPlus10PercentFlatPlus10Percent.xml anywhere in the search path. The search path is defined by: CMSSW_SEARCH_PATH ${CMSSW_SEARCH_PATH} is: [cut ]
and looking at $CMSSW_RELEASE/src/Geometry/CMSCommonData/data/dd4hep/ such a file doesn't exist. However, the following does cmsExtendedGeometry2024FlatPlus10Percent.xml
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/46290#issuecomment-2399768596, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZOWWDWJ3TFTZMVRWD3LZ2PIYXAVCNFSM6AAAAABPQEC2WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJZG43DQNJZGY. You are receiving this because you were mentioned.Message ID: @.***>
Following @Dr15Jones suggestion in the private email thread, limiting VSIZE to ~5 GB (ulimit -v 5000000
) to trigger std::bad_alloc
exception instead of being killed by the OS, and running in gdb
by catching exceptions to see where the std::bad_alloc
exception is thrown shows this stack trace
(gdb) where
#0 0x00007ffff5b612f1 in __cxxabiv1::__cxa_throw (obj=0x7fffcf972880, tinfo=0x7ffff5cc5e18 <typeinfo for std::bad_alloc>, dest=0x7ffff5b5f6e0 <std::bad_alloc::~bad_alloc()>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1 0x00007ffff5b5811b in std::__throw_bad_alloc() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02858/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib64/libstdc++.so.6
#2 0x00007ffff67bb39b in handleOOM (size=<optimized out>, nothrow=<optimized out>) at src/jemalloc_cpp.cpp:90
#3 0x00007fffcdbf2cef in HcalGeometry::init() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libGeometryHcalTowerAlgo.so
#4 0x00007fffcdbf303d in HcalGeometry::HcalGeometry(HcalTopology const&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libGeometryHcalTowerAlgo.so
#5 0x00007fffcdbf429b in HcalFlexiHardcodeGeometryLoader::load(HcalTopology const&, HcalDDDRecConstants const&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libGeometryHcalTowerAlgo.so
#6 0x00007fffcac60803 in HcalHardcodeGeometryEP::produceAligned(HcalGeometryRecord const&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#7 0x00007fffcac6478e in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#8 0x00007fffcac64911 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#9 0x00007ffff79d0b35 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#10 0x00007ffff63c53e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#12 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#13 0x00007ffff7bce1ab in edm::FinalWaitingTask::wait() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007ffff7bdbc8f in edm::EventProcessor::processRuns() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007ffff7bdc141 in edm::EventProcessor::runToCompletion() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#17 0x00007ffff63b19ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/arena.cpp:688
#18 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#19 0x0000000000405100 in main ()
Running again produces also an assertion failure
cmsRun: src/Geometry/CaloGeometry/interface/EZMgrFL.h:21: EZMgrFL<T>::EZMgrFL(size_type, size_type) [with T = Point3DBase<float, GlobalTag>; size_type = long unsigned int]: Assertion `vecSize > 0' failed.
Thread 1 "cmsRun" received signal SIGABRT, Aborted.
0x00007ffff516852f in raise () from /lib64/libc.so.6
(gdb) where
#0 0x00007ffff516852f in raise () from /lib64/libc.so.6
#1 0x00007ffff513be65 in abort () from /lib64/libc.so.6
#2 0x00007ffff513bd39 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007ffff5160e86 in __assert_fail () from /lib64/libc.so.6
#4 0x00007fffcda3ec95 in EZMgrFL<Point3DBase<float, GlobalTag> >::EZMgrFL (subSize=8, vecSize=0, this=<optimized out>) at src/Geometry/CaloGeometry/interface/EZMgrFL.h:21
#5 CaloSubdetectorGeometry::allocateCorners (this=0x7fffcf929b80, n=0) at src/Geometry/CaloGeometry/src/CaloSubdetectorGeometry.cc:108
#6 0x00007fffcdbdcb8b in HcalFlexiHardcodeGeometryLoader::load (this=0x7fffffff0ef0, fTopology=..., hcons=...)
at src/Geometry/HcalTowerAlgo/src/HcalFlexiHardcodeGeometryLoader.cc:28
#7 0x00007fffcaba4803 in HcalHardcodeGeometryEP::produceAligned(HcalGeometryRecord const&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#8 0x00007fffcaba878e in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#9 0x00007fffcaba8911 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute()
() from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#10 0x00007ffff79d0b35 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#11 0x00007ffff63c53e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#14 0x00007ffff7bce1ab in edm::FinalWaitingTask::wait() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007ffff7bdbc8f in edm::EventProcessor::processRuns() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007ffff7bdc141 in edm::EventProcessor::runToCompletion() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#18 0x00007ffff63b19ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...)
at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/arena.cpp:688
#19 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#20 0x0000000000405100 in main ()
i.e. here
https://github.com/cms-sw/cmssw/blob/d8f3e55e66f6f333ae3d1a1fc10b506447aac3ed/Geometry/HcalTowerAlgo/src/HcalFlexiHardcodeGeometryLoader.cc#L28
the fTopology.ncells() + fTopology.getHFSize()
is 0.
This kind of varying behavior hints towards a memory corruption.
So when I run the job (after fixing the scripts) I see the out of memory error. The debugger showed a very large allocation request. I then turned on one of the debug printouts and see
HcalGeometry_init(): HBSize 892613681 HESize 808794672 HOSize 1111110454 HFSize 1634365029
Are these sizes what is actually expected?
I got these numbers from the same printout (now that my test case went back to the std::bad_alloc
HcalGeometry_init(): HBSize 1985088260 HESize 3964404929 HOSize 1208271767 HFSize 79882717
A smoking gun in both the bad_alloc
and assertion failure behaviors is that the numbers come from HcalTopology
.
I understood that the script createExtended2024DD4hepPayloads.sh cannot be run multiple times in the same area because of sed. However, after running the script once cmsRun could be run multiple times and that is when I saw memory exhausted and got a system "kill". I am attaching a log file which I got for the first time. You could see "ERROR" getting printed multiple times starting from TKRECO_Geometry
HCalTopology was not a new code. ZDCTopology is new and so is calowriters where the ZDC part is new. Maybe I remove the alignment part for ZDC and see the impact
Got again the assertion failure behavior. This time the printout from HcalGeometry_init()
is
HcalGeometry_init(): HBSize 0 HESize 0 HOSize 0 HFSize 1
This time the assertion failure stack trace was
#0 0x00007ffff516852f in raise () from /lib64/libc.so.6
#1 0x00007ffff513be65 in abort () from /lib64/libc.so.6
#2 0x00007ffff513bd39 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007ffff5160e86 in __assert_fail () from /lib64/libc.so.6
#4 0x00007fffcd9ded2f in EZMgrFL<float>::EZMgrFL (subSize=5, vecSize=0, this=<optimized out>) at src/Geometry/CaloGeometry/interface/EZMgrFL.h:21
#5 CaloSubdetectorGeometry::allocatePar (this=0x7fffcf8f2b80, n=0, m=5) at src/Geometry/CaloGeometry/src/CaloSubdetectorGeometry.cc:115
#6 0x00007fffcdb9cc68 in HcalFlexiHardcodeGeometryLoader::load (this=0x7fffffff0ef0, fTopology=..., hcons=...) at src/Geometry/HcalTowerAlgo/src/HcalFlexiHardcodeGeometryLoader.cc:33
#7 0x00007fffcab44803 in HcalHardcodeGeometryEP::produceAligned(HcalGeometryRecord const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#8 0x00007fffcab4878e in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#9 0x00007fffcab48911 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<HcalHardcodeGeometryEP, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >(HcalHardcodeGeometryEP*, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> > (HcalHardcodeGeometryEP::*)(HcalGeometryRecord const&), edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> const&, edm::es::Label const&)::{lambda(HcalGeometryRecord const&)#1}, std::unique_ptr<CaloSubdetectorGeometry, std::default_delete<CaloSubdetectorGeometry> >, HcalGeometryRecord, edm::eventsetup::CallbackSimpleDecorator<HcalGeometryRecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(HcalGeometryRecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute() ()
from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/pluginGeometryHcalEventSetup.so
#10 0x00007ffff79d0b35 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#11 0x00007ffff63c53e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff308be00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#14 0x00007ffff7bce1ab in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007ffff7bdbc8f in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007ffff7bdc141 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_X_2024-10-06-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#18 0x00007ffff63b19ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-e785b749a0b6cb9c66dc1d78066210e0/tbb-v2021.9.0/src/tbb/arena.cpp:688
#19 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#20 0x0000000000405100 in main ()
and problem being hcalGeometry->numberOfShapes()
being 0 in
https://github.com/cms-sw/cmssw/blob/d8f3e55e66f6f333ae3d1a1fc10b506447aac3ed/Geometry/HcalTowerAlgo/src/HcalFlexiHardcodeGeometryLoader.cc#L30
@makortel and I think we have found the problem. The values for HcalTopology::HBSize_
seem to be random. The value is set in the constructor here (this is the constructor being called by the job as the debugger hit this as a break point for me):
Notice the if block. If neither of the two ifs are true, then HBSize is never set. Stepping through with the debugger shows that is the case here. I determined that the value of mode is 4 which corresponds to Run3
From git history I see the Run3
enum value was added in https://github.com/cms-sw/cmssw/pull/45511. That PR did modify the HcalTopology
constructor taking HcalTopologyMode::Mode
to include cases for Run3
, but not the constructor taking const HcalDDDRecConstants*
.
So after modifying the if
block, I see what appears to be better values
%MSG-s HCalGeom: HcalGeometryToDBEP:HcalGeometryToDBEP@callESModule 08-Oct-2024 09:51:58 CDT Run: 1 HcalGeometry_init(): HBSize 9216 HESize 14112 HOSize 2160 HFSize 7488
Thanks - I shall try to cure this
I think the logic in HcalTopology needs to be modified
See #46305
With the PR I made, the job still fails with
----- Begin Fatal Exception 08-Oct-2024 09:52:03 CDT-----------------------
An exception of category 'NoProductResolverException' occurred while
[0] Processing global begin Run run: 1
[1] Prefetching for module PCaloGeometryBuilder/'CaloGeometryWriter'
[2] Prefetching for EventSetup module ZdcGeometryToDBEP/''
[3] Calling method for EventSetup module ZdcHardcodeGeometryEP/''
Exception Message:
Cannot find EventSetup module to produce data of type "ZdcTopology" in
record "HcalRecNumberingRecord" with product label "".
Please add an ESSource or ESProducer to your job which can deliver this data.
----- End Fatal Exception -------------------------------------------------
Thanks Chris and Matti. There were some other logic in HcalTopology that were also wrong but the main issue was what you found. Thanks a lot
Sunanda
From: Chris Jones @.> Sent: 08 October 2024 20:25 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] Creation of Geometry Payloads for DataBase (Issue #46290)
So after modifying the if block, I see what appears to be better values
%MSG-s HCalGeom: @.*** 08-Oct-2024 09:51:58 CDT Run: 1 HcalGeometry_init(): HBSize 9216 HESize 14112 HOSize 2160 HFSize 7488
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/46290#issuecomment-2400084865, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZOSYE5MYSXBPNEZGDRTZ2PW7BAVCNFSM6AAAAABPQEC2WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBQGA4DIOBWGU. You are receiving this because you were mentioned.Message ID: @.***>
I was able to get the full script to run by adding
process.load("Geometry.ForwardGeometry.ZdcGeometry_cfi")
to geometryExtended2024DD4hep_writer.py
Thank you @makortel and @Dr15Jones for the big debug effort! This is going to save the possibility to implement ZDC geometry updates for this year HI data taking!
Just for completeness, running valgrind (without https://github.com/cms-sw/cmssw/pull/46305) did not reveal anything new.
With the proposed corrections to HcalTopology (+ other changes needed to this class), and correcting the scenario description in Configuration/Geometry, the payload creation has been done for 2024. So this issue is resolved.
The standard way of creating the payload is to follow these steps cmsrel CMSSW_14_2_X_2024-10-06-2300 cd CMSSW_14_2_X_2024-10-06-2300/src cmsenv git cms-addpkg CondTools/Geometry scram b -j4 cd CondTools/Geometry/test /bin/cp writehelpers/* . ./createExtended2024DD4hepPayloads.sh 142DD4hepV1
This creates several .db files, some for XML files used for simulation and a number of files needed for loading parameters for reconstruction geometry
There are several cmsRun steps in createExtended2024DD4hepPayloads.sh The second cmsRun which utilises cmsRun geometryExtended2024DD4heo_writer.py does not complete and gets killed.
Consequently, several .db files are not created which are recommended geometries for HCAL, ZDC, ,,,, and some parameters for Tracker