cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.31k forks source link

[ARM] Segfault in TBufferIO constructor #42070

Open makortel opened 1 year ago

makortel commented 1 year ago

Workflow 310.0 step 3 segfaulted in CMSSW_13_2_X_2023-06-22-2300 on el9_aarch64_gcc11 with

Thread 5 (Thread 0x40005aa091a0 (LWP 4100945) "cmsRun"):
#3  0x000040000a067bd8 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000400003a416c0 in TBufferIO::TBufferIO(TBuffer::EMode, int) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libRIO.so
#6  0x0000400003a3ac04 in TBufferFile::TBufferFile(TBuffer::EMode, int) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libRIO.so
#7  0x00004000034aa470 in TBasket::TBasket(char const*, char const*, TBranch*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#8  0x000040000351ec60 in TTree::CreateBasket(TBranch*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#9  0x00004000034b6db8 in TBranch::FillImpl(ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#10 0x00004000034c3594 in TBranchElement::FillImpl(ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#11 0x000040000352f950 in TTree::Fill() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#12 0x00004000ae772374 in tbb::detail::d1::task_arena_function<edm::RootOutputTree::fillTree()::{lambda()#1}, void>::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libIOPoolOutput.so
#13 0x0000400004861654 in operator() (__closure=<optimized out>) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/arena.cpp:757
#14 tbb::detail::d0::try_call_proxy<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> >::on_completion<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> > (on_completion_body=..., this=<optimized out>) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#15 tbb::detail::r1::isolate_within_arena (d=..., isolation=<optimized out>) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/arena.cpp:758
#16 0x00004000ae772f8c in edm::RootOutputTree::fillTree() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libIOPoolOutput.so
#17 0x00004000ae770b64 in edm::RootOutputFile::fillBranches(edm::BranchType const&, edm::OccurrenceForOutput const&, unsigned int, std::vector<edm::StoredProductProvenance, std::allocator<edm::StoredProductProvenance> >*, edm::ProductProvenanceRetriever const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libIOPoolOutput.so
#18 0x00004000ae772188 in edm::RootOutputFile::writeRun(edm::RunForOutput const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libIOPoolOutput.so
#19 0x0000400002d9e170 in edm::core::OutputModuleCore::doWriteRun(edm::RunPrincipal const&, edm::ModuleCallingContext const*, edm::MergeableRunProductMetadata const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#20 0x0000400002d9e2b4 in edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#21 0x0000400002d9e494 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#22 0x00004000033364e0 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreConcurrency.so

Thread 4 (Thread 0x400057cb91a0 (LWP 4100944) "cmsRun"):
#3  0x000040000a063338 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000040000549e5a4 in FSE_normalizeCount () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#6  0x00004000054b171c in ZSTD_NCountCost () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#7  0x00004000054b18f0 in ZSTD_selectEncodingType () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#8  0x00004000054a56fc in ZSTD_buildSequencesStatistics () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#9  0x00004000054a65ec in ZSTD_compressBlock_internal () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#10 0x00004000054a7568 in ZSTD_compress_frameChunk () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#11 0x00004000054aaa30 in ZSTD_compressEnd () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libzstd.so.1
#12 0x000040000400cf9c in R__zipZSTD () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libCore.so
#13 0x00004000034a7718 in TBasket::WriteBuffer() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#14 0x00004000034b617c in TBranch::WriteBasketImpl(TBasket*, int, ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#15 0x00004000034b6fd0 in TBranch::FlushBaskets() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#16 0x00004000034b7018 in TBranch::FlushBaskets() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#17 0x00004000035245bc in TTree::FlushBasketsImpl() const::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#18 0x0000400004f2bc64 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::d1::parallel_for_body_wrapper<std::function<void (unsigned int)>, unsigned int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libImt.so

Thread 3 (Thread 0x4000572a91a0 (LWP 4100943) "cmsRun"):
#3  0x000040000a063338 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000400004b28600 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/lib64/libstdc++.so.6
#6  0x000040000a95bbac in dqm::implementation::IGetter::getAllContents(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libDQMServicesCore.so
#7  0x00004000ae7faff8 in DQMRootOutputModule::writeRun(edm::RunForOutput const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/pluginDQMServicesFwkIOPlugins.so
#8  0x0000400002d9e170 in edm::core::OutputModuleCore::doWriteRun(edm::RunPrincipal const&, edm::ModuleCallingContext const*, edm::MergeableRunProductMetadata const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#9  0x0000400002d9e2b4 in edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#10 0x0000400002d9e494 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#11 0x00004000033364e0 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreConcurrency.so

Thread 1 (Thread 0x400004a1b260 (LWP 4096692) "cmsRun"):
#3  0x000040000a063338 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000400004e4b864 in write () from /lib64/libc.so.6
#6  0x0000400005f470d0 in edm::storage::File::syswrite(void const*, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libUtilitiesStorageFactory.so
#7  0x0000400005f43908 in edm::storage::File::write(void const*, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libUtilitiesStorageFactory.so
#8  0x0000400005f45d34 in edm::storage::StorageAccountProxy::write(void const*, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libUtilitiesStorageFactory.so
#9  0x0000400005f42ac0 in edm::storage::Storage::xwrite(void const*, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libUtilitiesStorageFactory.so
#10 0x000040000a42e934 in TStorageFactoryFile::WriteBuffer(char const*, int) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libIOPoolTFileAdaptor.so
#11 0x0000400003ad7eb4 in TKey::WriteFileKeepBuffer(TFile*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libRIO.so
#12 0x00004000034a75bc in TBasket::WriteBuffer() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#13 0x00004000034b617c in TBranch::WriteBasketImpl(TBasket*, int, ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#14 0x00004000034b6fd0 in TBranch::FlushBaskets() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#15 0x00004000035245bc in TTree::FlushBasketsImpl() const::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#16 0x0000400004f2bc64 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned int>, tbb::detail::d1::parallel_for_body_wrapper<std::function<void (unsigned int)>, unsigned int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libImt.so
#17 0x00004000048686e4 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x40010d4ebc00, this=0x4000057b3300) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#18 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x4000057b3300) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#19 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#20 0x0000400004f2acac in tbb::detail::d1::task_arena_function<ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void (unsigned int)> const&)::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libImt.so
#21 0x0000400004861654 in operator() (__closure=<optimized out>) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/arena.cpp:757
#22 tbb::detail::d0::try_call_proxy<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> >::on_completion<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> > (on_completion_body=..., this=<optimized out>) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#23 tbb::detail::r1::isolate_within_arena (d=..., isolation=<optimized out>) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/arena.cpp:758
#24 0x0000400004f296e4 in tbb::detail::d1::task_arena_function<ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void (unsigned int)> const&)::{lambda()#1}, void>::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libImt.so
#25 0x0000400004861018 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc11/external/tbb/v2021.9.0-c67f3b6114d13192876dcd1ab7a63fa6/tbb-v2021.9.0/src/tbb/arena.cpp:688
#26 0x0000400004f2b064 in ROOT::TThreadExecutor::ParallelFor(unsigned int, unsigned int, unsigned int, std::function<void (unsigned int)> const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libImt.so
#27 0x0000400003525714 in TTree::FlushBasketsImpl() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#28 0x000040000352c918 in TTree::OptimizeBaskets(unsigned long long, float, char const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el9_aarch64_gcc11/lib/libTree.so
#29 0x00004000ae7721b0 in edm::RootOutputFile::writeRun(edm::RunForOutput const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libIOPoolOutput.so
#30 0x0000400002d9e170 in edm::core::OutputModuleCore::doWriteRun(edm::RunPrincipal const&, edm::ModuleCallingContext const*, edm::MergeableRunProductMetadata const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#31 0x0000400002d9e2b4 in edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#32 0x0000400002d9e494 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeRunAsync(edm::WaitingTaskHolder, edm::RunPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*, edm::MergeableRunProductMetadata const*)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreFramework.so
#33 0x00004000033364e0 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el9_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el9_aarch64_gcc11/libFWCoreConcurrency.so

Current Modules:
Module: PoolOutputModule:MINIAODSIMoutput (crashed)
Module: DQMRootOutputModule:DQMoutput
Module: PoolOutputModule:RECOSIMoutput
Module: none

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el9_aarch64_gcc11/CMSSW_13_2_X_2023-06-22-2300/pyRelValMatrixLogs/run/310.0_Pyquen_GammaJet_pt20_2760GeV_2022/step3_Pyquen_GammaJet_pt20_2760GeV_2022.log#/

cmsbuild commented 1 year ago

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 year ago

assign core

cmsbuild commented 1 year ago

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 year ago

FYI @pcanal (I'm not expecting any immediate action, but in case the stack traces would raise any eyebrows)

pcanal commented 1 year ago

Nothing obvious :(

VinInn commented 1 year ago

three Output modules running concurrently? Smells of thread unsafety.

VinInn commented 1 year ago

memory model on ARM (and POWER) is not as strong as on X86_64 that essentially forgives most of the incorrect (unsafe) assumptions made in the code.

One can find many articles on internet: for instance https://www.arangodb.com/2021/02/cpp-memory-model-migrating-from-x86-to-arm/

Nothing that we are not aware of. Still clearly we can expect unsafe code to crash way more often on ARM than on x86_64 (where it may well work correctly under all possible conditions)

It would be useful to try to have an example that stress those parts of the code the may be unsafe and eventually able to crash it almost each time on ARM. At that point ThreadSanitizer may be used to pin point the critical issues.

hahnjo commented 1 year ago

Hi, looking at this from the ROOT side. How reproducible is this? Would it be possible to get a more accurate source location where it is crashing? I looked at the TBufferIO::TBufferIO constructor and don't really see anything that could lead to a crash, it's just default initializing a number of fields...

makortel commented 1 year ago

Hi, looking at this from the ROOT side. How reproducible is this?

So far we have seen the crash once on ARM, so based on past experience the likelihood to reproduce is tiny.