DUNE-DAQ / fdreadoutlibs

fdreadoutlibs
0 stars 3 forks source link

Sporadic segmentation fault #98

Closed adam-abed-abud closed 1 year ago

adam-abed-abud commented 1 year ago

From time to time (not found yet a way to reproduce the issue) we get a segmentation fault when issuing the start command. Here is the message from the log file:

2023-Apr-11 10:01:59,973 LOG [typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, void>::type dunedaq::iomanager::NetworkReceiverModel<Datatype>::add_callback_impl(std::function<void(MessageType&)>) [with MessageType = dunedaq::dfmessages::DataRequest; Datatype = dunedaq::dfmessages::DataRequest; typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, void>::type = void] at /cvmfs/dunedaq-development.opensciencegrid.org/nightly/N23-03-29/spack-0.18.1-gcc-12.1.0/spack-0.18.1/opt/spack/gcc-12.1.0/iomanager-N23-03-29-i2vg6vymsngctxw7sxgqie76wwiiekgu/include/iomanager/network/NetworkReceiverModel.hpp:183] Registering callback.
2023-Apr-11 10:01:59,973 ERROR [static void ers::ErrorHandler::SignalHandler::action(int, siginfo_t*, void*) at /tmp/root/spack-stage/spack-stage-ers-N23-03-29-q5qlc5zjwzjkbqzsniq4b43g5guwolyl/spack-src/src/ErrorHandler.cpp:90] Got signal 11 Segmentation fault (invalid memory reference)
        Parameters = 'name=Segmentation fault (invalid memory reference)' 'signum=11'
        2023-Apr-11 10:01:59,973 LOG [Qualifiers = 'unknown'
        void dunedaq::readoutlibs::ReadoutModel<ReadoutType, RequestHandlerType, LatencyBufferType, RawDataProcessorType>::run_timesync() [with ReadoutType = dunedaq::fdreadoutlibs::types::TriggerPrimitiveTypeAdapter; RequestHandlerType = dunedaq::readoutlibs::DefaultSkipListRequestHandler<dunedaq::fdreadoutlibs::types::TriggerPrimitiveTypeAdapter>; LatencyBufferType = dunedaq::readoutlibs::SkipListLatencyBufferModel<dunedaq::fdreadoutlibs::types::TriggerPrimitiveTypeAdapter>; RawDataProcessorType = dunedaq::fdreadoutlibs::SWWIBTriggerPrimitiveProcessor] at /nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/sourcecode/readoutlibs/include/readoutlibs/models/detail/ReadoutModel.hxx:307] Timesync with DAQ time 0 won't be sent out as it's an invalid sync.
host = np04-srv-0192023-Apr-11 10:01:59,973
        LOGuser = aabedabu [ (104771void dunedaq::readoutmodules::DataLinkHandler::do_start(const dunedaq::appfwk::DAQModule::data_t&) at /nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/sourcecode/readoutmodules/plugins/DataLinkHandler.cpp:85] tp_datahandler_5 successfully started for run number 135
)
        process id = 2905582
        thread id = 2905700
        process wd = /nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/configurations/testing_issue_from_kurt2023-Apr-11 10:01:59,973 LOG [void dunedaq::restcmd::RestEndpoint::handleResponseCommand(const dunedaq::restcmd::cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply&) at /tmp/root/spack-stage/spack-stage-restcmd-N23-03-29-lttipzeexhddhpex56ytehphnuze4anj/spack-src/src/RestEndpoint.cpp:101] Sending POST request to 10.73.136.38:56789/response

        stack trace of the crashing thread:
          #0  /cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v1.0/spack-0.18.1-gcc-12.1.0/spack-0.18.1/opt/spack/gcc-12.1.0/gcc-12.1.0-lqfco4qbx43tdrdhjc3olbeejucbvqwd/lib64/libstdc++.so.6(std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)+0x102) [0x7f1a83c110f2]
          #1  /lib64/libpthread.so.0(+0x12cf0) [0x7f1a83e61cf0]
          #2  /cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v1.0/spack-0.18.1-gcc-12.1.0/spack-0.18.1/opt/spack/gcc-12.1.0/gcc-12.1.0-lqfco4qbx43tdrdhjc3olbeejucbvqwd/lib64/libstdc++.so.6(std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)+0x102) [0x7f1a83c110f2]
          #3  /nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/install/readoutmodules/lib64/libreadoutmodules_DataLinkHandler_duneDAQModule.so(dunedaq::fdreadoutlibs::WIB2FrameProcessor::find_hits(dunedaq::fdreadoutlibs::types::DUNEWIBSuperChunkTypeAdapter const*, dunedaq::fdreadoutlibs::WIB2FrameHandler*)+0xb29) [0x7f1a5a1df4d9]
          #4  /nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/install/readoutmodules/lib64/libreadoutmodules_DataLinkHandler_duneDAQModule.so(dunedaq::readoutlibs::TaskRawDataProcessorModel<dunedaq::fdreadoutlibs::types::DUNEWIBSuperChunkTypeAdapter>::run_post_processing_thread(std::function<void (dunedaq::fdreadoutlibs::types::DUNEWIBSuperChunkTypeAdapter const*)>&, folly::ProducerConsumerQueue<dunedaq::fdreadoutlibs::types::DUNEWIBSuperChunkTypeAdapter const*>&)+0x8f) [0x7f1a5a0e39df]
          #5  /nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/install/readoutmodules/lib64/libreadoutmodules_DataLinkHandler_duneDAQModule.so(dunedaq::readoutlibs::ReusableThread::thread_worker()+0x9e) [0x7f1a5a13f8ae]
          #6  /cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v1.0/spack-0.18.1-gcc-12.1.0/spack-0.18.1/opt/spack/gcc-12.1.0/gcc-12.1.0-lqfco4qbx43tdrdhjc3olbeejucbvqwd/lib64/libstdc++.so.6(+0xdd5e3) [0x7f1a83c255e3]
          #7  /lib64/libpthread.so.0(+0x81ca) [0x7f1a83e571ca]
          #8  /lib64/libc.so.6(clone+0x43) [0x7f1a8379be73]
bash: line 1: 2905582 Aborted                 (core dumped) daq_application --name rulocalhost0 -c rest://localhost:3336 -i file://info_rulocalhost0_3336.json --configurationService file:///nfs/sw/work_dirs/aabedabu/dunedaq-v4.0.0.candidate/dev/configurations/testing_issue_from_kurt/nanorc-flatconf-52_4731b
mroda88 commented 1 year ago

Not valid anymore as this is related to pre-refactoring code.