cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.32k forks source link

[ARM] Segfault in jemalloc in Wrapper destructor #42072

Open makortel opened 1 year ago

makortel commented 1 year ago

Workflow 2500.0 step 2 segfaulted in CMSSW_13_2_X_2023-06-22-2300 on el8_aarch64_gcc11 with

Begin processing the 8885th record. Run 1, Event 29903574, LumiSection 9585 on stream 3 at 23-Jun-2023 04:19:33.599 CEST

Thread 5 (Thread 0x400057a09260 (LWP 337220) "cmsRun"):
#3  0x000040000d6e7cc8 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  edata_arena_ind_get (edata=0x0) at include/jemalloc/internal/edata.h:258
#6  tcache_bin_flush_match (small=true, cur_binshard=<optimized out>, cur_arena_ind=0, edata=0x0) at src/tcache.c:301
#7  tcache_bin_flush_impl (small=true, nflush=100, ptrs=0x40002c93e754 <std::vector<reco::Vertex, std::allocator<reco::Vertex> >::~vector()+24>, binind=3518251416, cache_bin=0x400057a08680, tcache=0x400006e1f000 <vtable for edm::stream::EDProducerAdaptorBase+192>, tsd=0x400031bf0328) at src/tcache.c:434
#8  tcache_bin_flush_bottom (small=<optimized out>, rem=<optimized out>, binind=<optimized out>, cache_bin=<optimized out>, tcache=<optimized out>, tsd=tsd@entry=0x400031bf0328) at src/tcache.c:519
#9  je_tcache_bin_flush_small (tsd=tsd@entry=0x400057a0ee40, tcache=0x400006e1f000 <vtable for edm::stream::EDProducerAdaptorBase+192>, cache_bin=0x400057a08680, binind=114093548, rem=<optimized out>) at src/tcache.c:529
#10 0x0000400008270bdc in tcache_dalloc_small (slow_path=false, binind=<optimized out>, ptr=0x4000b89939a0, tcache=<optimized out>, tsd=0x400057a0ee40) at include/jemalloc/internal/tcache_inlines.h:157
#11 arena_sdalloc (slow_path=<optimized out>, caller_alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:418
#12 isdalloct (slow_path=<optimized out>, alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:133
#13 isfree (slow_path=false, tcache=<optimized out>, usize=<optimized out>, ptr=0x4000b89939a0, tsd=0x400057a0ee40) at src/jemalloc.c:2982
#14 je_sdallocx_default (ptr=0x4000b89939a0, size=<optimized out>, flags=<optimized out>) at src/jemalloc.c:3924
#15 0x00004000082c1ba0 in sizedDeleteImpl (size=<optimized out>, ptr=<optimized out>) at src/jemalloc_cpp.cpp:195
#16 operator delete (ptr=<optimized out>, size=<optimized out>) at src/jemalloc_cpp.cpp:200
#17 0x000040007bede2b8 in edm::Wrapper<edm::ValueMap<unsigned int> >::~Wrapper() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginRecoEgammaElectronIdentificationPlugins.so
#18 0x0000400006c1c0cc in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#19 0x0000400006cce308 in edm::DataManagingProductResolver::resetProductData_(bool) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#20 0x0000400006cbd7e4 in edm::Principal::clearPrincipal() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#21 0x0000400006c2b6f8 in edm::EventPrincipal::clearEventPrincipal() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#22 0x0000400006c6d454 in edm::FunctorWaitingTask<edm::waiting_task::detail::WaitingTaskChain<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::processEventAsyncImpl(edm::WaitingTaskHolder, unsigned int)::{lambda(auto:1)#4}>, edm::waiting_task::detail::Conditional<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::processEventAsyncImpl(edm::WaitingTaskHolder, unsigned int)::{lambda(auto:1)#3}> >, edm::waiting_task::detail::Conditional<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::processEventAsyncImpl(edm::WaitingTaskHolder, unsigned int)::{lambda(auto:1)#2}> >, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::processEventAsyncImpl(edm::WaitingTaskHolder, unsigned int)::{lambda(auto:1)#1}> >::runLast(edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*)#1}>::execute() [clone .lto_priv.0] () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so

Thread 4 (Thread 0x4000549b9260 (LWP 337091) "cmsRun"):
#2  0x000040000d6e33f8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000040004ddfa078 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ProcessOutputs(tensorflow::NodeItem const&, tensorflow::OpKernelContext*, tensorflow::Entry*, tensorflow::NodeExecStatsInterface*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#5  0x000040004ddff1d4 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#6  0x0000400035dce4a4 in tensorflow::thread::ThreadPool::Schedule(std::function<void ()>) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_cc.so.2
#7  0x000040003c979bdc in std::_Function_handler<void (std::function<void ()>), tensorflow::DirectSession::RunInternal(long, tensorflow::RunOptions const&, tensorflow::CallFrameInterface*, tensorflow::DirectSession::ExecutorsAndKeys*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&)::{lambda(std::function<void ()>)#6}>::_M_invoke(std::_Any_data const&, std::function<void ()>&&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_cc.so.2
#8  0x000040004ddf285c in void tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::RunTask<std::_Bind<void (tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::*(tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>*, tensorflow::SimplePropagatorState::TaggedNode, long))(tensorflow::SimplePropagatorState::TaggedNode, long)> >(std::_Bind<void (tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::*(tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>*, tensorflow::SimplePropagatorState::TaggedNode, long))(tensorflow::SimplePropagatorState::TaggedNode, long)>&&) [clone .constprop.0] () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#9  0x000040004ddf3024 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ScheduleReady(absl::lts_20210324::InlinedVector<tensorflow::SimplePropagatorState::TaggedNode, 8ul, std::allocator<tensorflow::SimplePropagatorState::TaggedNode> >*, tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#10 0x000040004ddfa5ec in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::NodeDone(tensorflow::Status const&, absl::lts_20210324::InlinedVector<tensorflow::SimplePropagatorState::TaggedNode, 8ul, std::allocator<tensorflow::SimplePropagatorState::TaggedNode> >*, tensorflow::NodeExecStatsInterface*, tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#11 0x000040004ddff09c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#12 0x0000400035dce4a4 in tensorflow::thread::ThreadPool::Schedule(std::function<void ()>) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_cc.so.2
#13 0x000040003c979bdc in std::_Function_handler<void (std::function<void ()>), tensorflow::DirectSession::RunInternal(long, tensorflow::RunOptions const&, tensorflow::CallFrameInterface*, tensorflow::DirectSession::ExecutorsAndKeys*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&)::{lambda(std::function<void ()>)#6}>::_M_invoke(std::_Any_data const&, std::function<void ()>&&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_cc.so.2
#14 0x000040004ddf2e3c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ScheduleReady(absl::lts_20210324::InlinedVector<tensorflow::SimplePropagatorState::TaggedNode, 8ul, std::allocator<tensorflow::SimplePropagatorState::TaggedNode> >*, tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#15 0x000040004ddf790c in tensorflow::(anonymous namespace)::ExecutorImpl::RunAsync(tensorflow::Executor::Args const&, std::function<void (tensorflow::Status const&)>) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_framework.so.2
#16 0x000040003c9890e0 in tensorflow::DirectSession::RunInternal(long, tensorflow::RunOptions const&, tensorflow::CallFrameInterface*, tensorflow::DirectSession::ExecutorsAndKeys*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_cc.so.2
#17 0x000040003c98b170 in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libtensorflow_cc.so.2
#18 0x0000400032f5cde4 in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::thread::ThreadPoolOptions const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libPhysicsToolsTensorFlow.so
#19 0x0000400032f5cfdc in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::thread::ThreadPoolInterface*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libPhysicsToolsTensorFlow.so
#20 0x000040007b800ba8 in void DeepTauId::getPredictionsV2<pat::PackedCandidate, pat::Tau>(reco::BaseTau const&, unsigned long, edm::RefToBase<reco::BaseTau>, std::vector<pat::Electron, std::allocator<pat::Electron> > const*, std::vector<pat::Muon, std::allocator<pat::Muon> > const*, edm::View<reco::Candidate> const&, reco::Vertex const&, double, unsigned long long const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >&, (anonymous namespace)::TauFunc) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginRecoTauTagRecoTauPlugins.so
#21 0x000040007b8026c4 in DeepTauId::getPredictions(edm::Event&, edm::Handle<edm::View<reco::BaseTau> >) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginRecoTauTagRecoTauPlugins.so
#22 0x000040007b8044a8 in DeepTauId::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginRecoTauTagRecoTauPlugins.so
#23 0x0000400006d44ccc in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#24 0x0000400006d2aeb0 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so

Thread 3 (Thread 0x400053fa9260 (LWP 337090) "cmsRun"):
#2  0x000040000d6e33f8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000040005f8b4604 in MlasSgemmKernelAdd () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#5  0x000040005f89a4c8 in MlasSgemmOperation(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, unsigned long, unsigned long, unsigned long, float, float const*, unsigned long, float const*, unsigned long, float, float*, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#6  0x000040005f89aa40 in MlasSgemmThreaded(long, long, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, unsigned long, unsigned long, unsigned long, MLAS_SGEMM_DATA_PARAMS const*, long) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#7  0x000040005f89aac4 in std::_Function_handler<void (long), MlasGemmBatch(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, unsigned long, unsigned long, unsigned long, MLAS_SGEMM_DATA_PARAMS const*, unsigned long, onnxruntime::concurrency::ThreadPool*)::{lambda(long)#1}>::_M_invoke(std::_Any_data const&, long&&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#8  0x000040005f8bfcf4 in MlasTrySimpleParallel(onnxruntime::concurrency::ThreadPool*, long, std::function<void (long)> const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#9  0x000040005f89ac24 in MlasGemmBatch(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, unsigned long, unsigned long, unsigned long, MLAS_SGEMM_DATA_PARAMS const*, unsigned long, onnxruntime::concurrency::ThreadPool*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#10 0x000040005f89e2b0 in MlasConv(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, onnxruntime::concurrency::ThreadPool*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#11 0x000040005f3a2ed8 in onnxruntime::Conv<float>::Compute(onnxruntime::OpKernelContext*) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#12 0x000040005f7c8b44 in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#13 0x000040005f7c0cb0 in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#14 0x000040005f7cbca0 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long, bool) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#15 0x000040005f7c7ee4 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#16 0x000040005f7959f4 in onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection*, bool, onnxruntime::Stream*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#17 0x000040005f797f04 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool, bool, onnxruntime::Stream*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#18 0x000040005f7986a8 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, OrtRunOptions const&, onnxruntime::logging::Logger const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#19 0x000040005f106260 in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) [clone .localalias] () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#20 0x000040005f0a1728 in OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libonnxruntime.so.1.14.1
#21 0x000040005c584584 in cms::Ort::ONNXRuntime::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::vector<float, std::allocator<float> >, std::allocator<std::vector<float, std::allocator<float> > > >&, std::vector<std::vector<long, std::allocator<long> >, std::allocator<std::vector<long, std::allocator<long> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, long) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libPhysicsToolsONNXRuntime.so
#22 0x00004000aec34034 in BoostedJetONNXJetTagsProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginRecoBTagONNXRuntimePlugins.so
#23 0x0000400006d44ccc in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#24 0x0000400006d2aeb0 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so

Thread 1 (Thread 0x40000892cf30 (LWP 234127) "cmsRun"):
#2  0x000040000d6e33f8 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000400008c26f70 in __log_finite () from /lib64/libm.so.6
#5  0x0000400099369498 in Rivet::Vector3::pseudorapidity() const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginGeneratorInterfaceRivetInterface_plugins.so
#6  0x000040009936c6f0 in Rivet::RivetAnalysis::analyze(Rivet::Event const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginGeneratorInterfaceRivetInterface_plugins.so
#7  0x0000400099932c88 in Rivet::AnalysisHandler::analyze(HepMC::GenEvent const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/external/el8_aarch64_gcc11/lib/libRivet.so
#8  0x000040009936fc34 in ParticleLevelProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/pluginGeneratorInterfaceRivetInterface_plugins.so
#9  0x0000400006d3b724 in edm::one::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#10 0x0000400006d22640 in edm::WorkerT<edm::one::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#11 0x0000400006cb6ee8 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so
#12 0x0000400006cb811c in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02790/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-06-22-2300/lib/el8_aarch64_gcc11/libFWCoreFramework.so

Current Modules:
Module: none (crashed)
Module: BoostedJetONNXJetTagsProducer:pfParticleNetFromMiniAODAK4CHSCentralJetTagsWithDeepInfo
Module: DeepTauId:deepTau2018v2p5ForNano
Module: ParticleLevelProducer:particleLevel

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_aarch64_gcc11/CMSSW_13_2_X_2023-06-22-2300/pyRelValMatrixLogs/run/2500.0_NANOmc106Xul16v2/step2_NANOmc106Xul16v2.log#/

makortel commented 1 year ago

assign core

cmsbuild commented 1 year ago

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild commented 1 year ago

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 year ago

Maybe a symptom of memory mismanagement somewhere? Another recent occurrance of a crash in ~Wrapper() was in https://github.com/cms-sw/cmssw/issues/41477