art-daq / artdaq

Other
0 stars 3 forks source link

Summary of EventBuilder crashes at protoDUNE July 23-26 #80

Closed eflumerf closed 2 years ago

eflumerf commented 2 years ago

This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/20458 (FNAL account required) Originally created by @bieryAtFnal on 2018-07-26 16:27:43

eflumerf commented 2 years ago

Comment by @bieryAtFnal on 2018-07-26 16:28:43


EventBuilder2 on srv-001:

%MSG-i EventBuilder2_TCPSocketTransfer: Initializing 24-Jul-2018 06:10:06 CEST Booted TCPSocket_transfer.cc:790 transfer_between_0_and_10_RECV: Starting Listener Thread for port 11010 (rank=10,partition=1) %MSG %MSG-e EventBuilder2_TCP_listen_fd: Initializing 24-Jul-2018 06:10:06 CEST Booted TCP_listen_fd.cc:55 Could not bind socket for port 11010! Exiting with code 3! %MSG bind error: File exists

** Break ** segmentation violation /nfs/home/np04daq/boot.sh: line 10: 143150 Segmentation fault (core dumped) $2 -c "id: $3 commanderPluginType: xmlrpc rank: $4 application_name: $5 partition_number: $6"

eflumerf commented 2 years ago

Comment by @bieryAtFnal on 2018-07-26 16:31:28


EventBuilder6 on srv-002 at 23:24:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `eventbuilder -c id: 6239 commanderPluginType: xmlrpc rank: 77 application_name:'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  artdaq::SharedMemoryManager::GetBuffersOwnedByManager (this=this@entry=0x7f3c0800b420, locked=locked@entry=false)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:499
499         for (auto ii = 0; ii < shm_ptr_->buffer_count; ++ii)
[Current thread is 1 (Thread 0x7f3c1f38b4c0 (LWP 80625))]
(gdb) where
#0  artdaq::SharedMemoryManager::GetBuffersOwnedByManager (this=this@entry=0x7f3c0800b420, locked=locked@entry=false)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:499
#1  0x00007f3c1d31e9f5 in artdaq::SharedMemoryManager::Detach (this=this@entry=0x7f3c0800b420, throwException=throwException@entry=false, category=..., 
    message=..., force=force@entry=true) at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:945
#2  0x00007f3c1d339f3e in signal_handler (signum=)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:29
#3  
#4  0x00007f3c1a3aaa3d in poll () from /lib64/libc.so.6
#5  0x00007f3c11cc6c3a in waitForConnection (listenSocketP=0x14590c0, listenSocketP=0x14590c0, errorP=0x7ffc1d64dba0, interruptedP=)
    at socket_unix.c:694
#6  chanSwitchAccept (chanSwitchP=, channelPP=0x7ffc1d64dba8, channelInfoPP=0x7ffc1d64dbb0, errorP=0x7ffc1d64dba0) at socket_unix.c:804
#7  0x00007f3c11cbda7f in ChanSwitchAccept (chanSwitchP=0x14590e0, channelPP=0x7ffc1d64dba8, channelInfoPP=0x7ffc1d64dbb0, errorP=)
    at chanswitch.c:159
#8  0x00007f3c11cc54ef in acceptAndProcessNextConnection (errorP=0x7ffc1d64db98, outstandingConnListP=0x1458fb0, serverP=0x1459090) at server.c:1191
#9  serverRun2 (errorP=0x7ffc1d64db98, serverP=0x1459090) at server.c:1242
#10 ServerRun (serverP=serverP@entry=0x1459090) at server.c:1280
#11 0x00007f3c127063f8 in xmlrpc_c::setupSignalsAndRunAbyss (abyssServerP=0x1459090) at server_abyss.cpp:760
#12 0x00007f3c12707219 in xmlrpc_c::serverAbyss_impl::run (this=) at server_abyss.cpp:771
#13 0x00007f3c127076bd in xmlrpc_c::serverAbyss::run (this=) at server_abyss.cpp:873
#14 0x00007f3c1292e0b3 in artdaq::xmlrpc_commander::run_server (this=0x1455d60)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:1114
#15 0x000000000042a53c in artdaq::artdaqapp::runArtdaqApp (task=task@entry=artdaq::detail::EventBuilderTask, config_ps=...)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/proto/artdaqapp.hh:113
#16 0x00000000004167cd in main (argc=3, argv=0x7ffc1d65b4d8) at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/proto/eventbuilder.cc:9
(gdb) 
#0  artdaq::SharedMemoryManager::GetBuffersOwnedByManager (this=this@entry=0x7f3c0800b420, locked=locked@entry=false)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:499
#1  0x00007f3c1d31e9f5 in artdaq::SharedMemoryManager::Detach (this=this@entry=0x7f3c0800b420, throwException=throwException@entry=false, category=..., 
    message=..., force=force@entry=true) at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:945
#2  0x00007f3c1d339f3e in signal_handler (signum=)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:29
#3  
#4  0x00007f3c1a3aaa3d in poll () from /lib64/libc.so.6
#5  0x00007f3c11cc6c3a in waitForConnection (listenSocketP=0x14590c0, listenSocketP=0x14590c0, errorP=0x7ffc1d64dba0, interruptedP=)
    at socket_unix.c:694
#6  chanSwitchAccept (chanSwitchP=, channelPP=0x7ffc1d64dba8, channelInfoPP=0x7ffc1d64dbb0, errorP=0x7ffc1d64dba0) at socket_unix.c:804
#7  0x00007f3c11cbda7f in ChanSwitchAccept (chanSwitchP=0x14590e0, channelPP=0x7ffc1d64dba8, channelInfoPP=0x7ffc1d64dbb0, errorP=)
    at chanswitch.c:159
#8  0x00007f3c11cc54ef in acceptAndProcessNextConnection (errorP=0x7ffc1d64db98, outstandingConnListP=0x1458fb0, serverP=0x1459090) at server.c:1191
#9  serverRun2 (errorP=0x7ffc1d64db98, serverP=0x1459090) at server.c:1242
#10 ServerRun (serverP=serverP@entry=0x1459090) at server.c:1280
#11 0x00007f3c127063f8 in xmlrpc_c::setupSignalsAndRunAbyss (abyssServerP=0x1459090) at server_abyss.cpp:760
#12 0x00007f3c12707219 in xmlrpc_c::serverAbyss_impl::run (this=) at server_abyss.cpp:771
#13 0x00007f3c127076bd in xmlrpc_c::serverAbyss::run (this=) at server_abyss.cpp:873
#14 0x00007f3c1292e0b3 in artdaq::xmlrpc_commander::run_server (this=0x1455d60)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:1114
#15 0x000000000042a53c in artdaq::artdaqapp::runArtdaqApp (task=task@entry=artdaq::detail::EventBuilderTask, config_ps=...)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/proto/artdaqapp.hh:113
#16 0x00000000004167cd in main (argc=3, argv=0x7ffc1d65b4d8) at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/proto/eventbuilder.cc:9
(gdb) 
#0  artdaq::SharedMemoryManager::GetBuffersOwnedByManager (this=this@entry=0x7f3c0800b420, locked=locked@entry=false)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:499
#1  0x00007f3c1d31e9f5 in artdaq::SharedMemoryManager::Detach (this=this@entry=0x7f3c0800b420, throwException=throwException@entry=false, category=..., 
    message=..., force=force@entry=true) at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:945
#2  0x00007f3c1d339f3e in signal_handler (signum=)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq_core/artdaq-core/Core/SharedMemoryManager.cc:29
#3  
#4  0x00007f3c1a3aaa3d in poll () from /lib64/libc.so.6
#5  0x00007f3c11cc6c3a in waitForConnection (listenSocketP=0x14590c0, listenSocketP=0x14590c0, errorP=0x7ffc1d64dba0, interruptedP=)
    at socket_unix.c:694
#6  chanSwitchAccept (chanSwitchP=, channelPP=0x7ffc1d64dba8, channelInfoPP=0x7ffc1d64dbb0, errorP=0x7ffc1d64dba0) at socket_unix.c:804
#7  0x00007f3c11cbda7f in ChanSwitchAccept (chanSwitchP=0x14590e0, channelPP=0x7ffc1d64dba8, channelInfoPP=0x7ffc1d64dbb0, errorP=)
    at chanswitch.c:159
#8  0x00007f3c11cc54ef in acceptAndProcessNextConnection (errorP=0x7ffc1d64db98, outstandingConnListP=0x1458fb0, serverP=0x1459090) at server.c:1191
#9  serverRun2 (errorP=0x7ffc1d64db98, serverP=0x1459090) at server.c:1242
#10 ServerRun (serverP=serverP@entry=0x1459090) at server.c:1280
#11 0x00007f3c127063f8 in xmlrpc_c::setupSignalsAndRunAbyss (abyssServerP=0x1459090) at server_abyss.cpp:760
#12 0x00007f3c12707219 in xmlrpc_c::serverAbyss_impl::run (this=) at server_abyss.cpp:771
#13 0x00007f3c127076bd in xmlrpc_c::serverAbyss::run (this=) at server_abyss.cpp:873
#14 0x00007f3c1292e0b3 in artdaq::xmlrpc_commander::run_server (this=0x1455d60)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:1114
#15 0x000000000042a53c in artdaq::artdaqapp::runArtdaqApp (task=task@entry=artdaq::detail::EventBuilderTask, config_ps=...)
    at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/proto/artdaqapp.hh:113
#16 0x00000000004167cd in main (argc=3, argv=0x7ffc1d65b4d8) at /nfs/sw/work_dirs/dune-artdaq-20180722/srcs/artdaq/proto/eventbuilder.cc:9
eflumerf commented 2 years ago

Comment by @bieryAtFnal on 2018-07-26 16:38:36


EventBuilder10 on srv-004 at 11:02 on 23-Jul:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `eventbuilder -c id: 6244 commanderPluginType: xmlrpc rank: 16 application_name:'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007feacafe81f7 in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7feabd911400 (LWP 123631))]
(gdb) where
#0  0x00007feacafe81f7 in raise () from /lib64/libc.so.6
#1  0x00007feacafe9a28 in abort () from /lib64/libc.so.6
#2  0x00007feacb027f47 in __libc_message () from /lib64/libc.so.6
#3  0x00007feacb02f619 in _int_free () from /lib64/libc.so.6
#4  0x00007fe942f457f8 in __gnu_cxx::new_allocator, std::allocator > > > >::deallocate (this=, __p=0x7fe9380018c0) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/ext/new_allocator.h:110
#5  std::allocator_traits, std::allocator > > > > >::deallocate (
    __a=..., __n=1, __p=0x7fe9380018c0) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/alloc_traits.h:462
#6  std::_Rb_tree, std::allocator > >, std::_Select1st, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::_M_put_node (
    __p=0x7fe9380018c0, this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:509
#7  std::_Rb_tree, std::allocator > >, std::_Select1st, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::_M_drop_node (
    __p=0x7fe9380018c0, this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:576
#8  std::_Rb_tree, std::allocator > >, std::_Select1st, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::_M_erase_aux (
    __position=..., this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:2275
#9  std::_Rb_tree, std::allocator > >, std::_Select1st, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::erase[abi:cxx11](std::_Rb_tree_const_iterator, std::allocator > > >) (__position=..., 
    this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:1036
#10 std::_Rb_tree, std::allocator > >, std::_Select1st to continue, or q  to quit---
int>, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::_M_erase_aux (
    __last=..., __first=..., this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:2289
#11 std::_Rb_tree, std::allocator > >, std::_Select1st, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::erase[abi:cxx11](std::_Rb_tree_const_iterator, std::allocator > > >, std::_Rb_tree_const_iterator, std::allocator > > >) (__last=..., __first=..., this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:1069
#12 std::_Rb_tree, std::allocator > >, std::_Select1st, std::allocator > > >, std::less, std::allocator, std::allocator > > > >::erase (
    this=this@entry=0x7fe943152920 , __x=@0x7feabd8ee1d0: 12)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:2300
#13 0x00007fe942f36f46 in std::map, std::allocator >, std::less, std::allocator, std::allocator > > > >::erase (__x=@0x7feabd8ee1d0: 12, this=0x7fe943152920 )
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_map.h:981
#14 artdaq::TCPSocketTransfer::~TCPSocketTransfer (this=0x7feab8039fb0, __in_chrg=)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/TransferPlugins/TCPSocket_transfer.cc:104

#15 0x00007fe942f37061 in artdaq::TCPSocketTransfer::~TCPSocketTransfer (this=0x7feab8039fb0, __in_chrg=)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/TransferPlugins/TCPSocket_transfer.cc:116
#16 0x00007feabc4288e9 in std::default_delete::operator() (this=, __ptr=)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unique_ptr.h:76
#17 std::unique_ptr >::~unique_ptr (this=0x7feab803f020, __in_chrg=)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unique_ptr.h:239

#18 artdaq::AutodetectTransfer::~AutodetectTransfer (this=0x7feab803efd0, __in_chrg=)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/TransferPlugins/Autodetect_transfer.cc:29
#19 artdaq::AutodetectTransfer::~AutodetectTransfer (this=0x7feab803efd0, __in_chrg=)
---Type  to continue, or q  to quit---
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/TransferPlugins/Autodetect_transfer.cc:29
#20 0x00007feace824394 in std::default_delete::operator() (this=, __ptr=)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unique_ptr.h:76
#21 std::unique_ptr >::~unique_ptr (this=, __in_chrg=)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unique_ptr.h:239
#22 std::pair > >::~pair (this=, 
    __in_chrg=) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_pair.h:194
#23 __gnu_cxx::new_allocator > > > >::destroy > > > (this=, 
    __p=) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/ext/new_allocator.h:124
#24 std::allocator_traits > > > > >::destroy > > > (__a=..., 
    __p=) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/alloc_traits.h:487
#25 std::_Rb_tree > >, std::_Select1st > > >, std::less, std::allocator > > > >::_M_destroy_node (this=, 
    __p=0x7feab8034ec0) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:567
#26 std::_Rb_tree > >, std::_Select1st > > >, std::less, std::allocator > > > >::_M_drop_node (this=0x7feab8018b08, __p=0x7feab8034ec0)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:575
#27 std::_Rb_tree > >, std::_Select1st > > >, std::less, std::allocator > > > >::_M_erase (__x=0x7feab8034ec0, this=0x7feab8018b08)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:1640
#28 std::_Rb_tree > >, std::_Select1st to continue, or q  to quit---
pair > > >, std::less, std::allocator > > > >::_M_erase (this=0x7feab8018b08, __x=0x7feab803e150)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:1638
#29 0x00007feace814042 in std::_Rb_tree > >, std::_Select1st > > >, std::less, std::allocator > > > >::~_Rb_tree (this=0x7feab8018b08, 
    __in_chrg=) at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_tree.h:873
#30 std::map >, std::less, std::allocator > > > >::~map (this=0x7feab8018b08, __in_chrg=)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/stl_map.h:96
#31 artdaq::DataReceiverManager::~DataReceiverManager (this=0x7feab8018ac0, __in_chrg=)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/DAQrate/DataReceiverManager.cc:114

#32 0x00007feace8142a1 in artdaq::DataReceiverManager::~DataReceiverManager (this=0x7feab8018ac0, __in_chrg=)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/DAQrate/DataReceiverManager.cc:120
#33 0x00007feacec768c7 in std::default_delete::operator() (this=, __ptr=)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unique_ptr.h:76
#34 std::unique_ptr >::reset (__p=, this=0x7feab801e628)
    at /nfs/sw/artdaq/products/gcc/v6_4_0/Linux64bit+3.10-2.17/include/c++/6.4.0/bits/unique_ptr.h:347
#35 artdaq::DataReceiverCore::shutdown (this=0x7feab801e620)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/Application/DataReceiverCore.cc:235
#36 0x00007feacec62d58 in artdaq::EventBuilderApp::do_shutdown (this=0x247f4e0)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/Application/EventBuilderApp.cc:96
#37 0x00007feacebf7ba7 in artdaq::Main_Initialized::shutdown (this=0x7feaceede3f0 , context=..., timeout=45)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/build_slf7.x86_64/artdaq/artdaq/Application/Commandable_sm.cpp:306
#38 0x00007feacebf2c5b in artdaq::CommandableContext::shutdown (timeout=45, this=0x247f4e8)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/build_slf7.x86_64/artdaq/artdaq/Application/Commandable_sm.h:296
---Type  to continue, or q  to quit---
#39 artdaq::InitializedMap_Ready::shutdown (this=, context=..., timeout=45)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/build_slf7.x86_64/artdaq/artdaq/Application/Commandable_sm.cpp:494
#40 0x00007feacec03c9b in artdaq::CommandableContext::shutdown (timeout=45, this=0x247f4e8)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/build_slf7.x86_64/artdaq/artdaq/Application/Commandable_sm.h:296
#41 artdaq::Commandable::shutdown (this=0x247f4e0, timeout=45)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/Application/Commandable.cc:176
#42 0x00007feac36307e6 in artdaq::shutdown_::execute_ (this=0x249dba0, paramList=...)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:611
#43 0x00007feac361d1f8 in artdaq::cmd_::execute (this=0x249dba0, paramList=..., retvalP=0x7feabd9027f8)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:411
#44 0x00007feac361b7be in c_executeMethod (envP=0x7feabd902910, paramArrayP=0x7fe8e80028e0, methodPtr=0x249dba0, callInfoPtr=0x7feabd902a30)
    at /nfs/sw/work_dirs/dune-artdaq_artdaq_v3_02_00_testing/srcs/artdaq/artdaq/ExternalComms/xmlrpc_commander.cc:81
#45 0x00007feac2bc6e3c in callNamedMethod (resultPP=0x7feabd902940, callInfoP=0x7feabd902a30, paramArrayP=0x7fe8e80028e0, methodP=, 
    envP=0x7feabd902910) at registry.c:307
#46 xmlrpc_dispatchCall (envP=envP@entry=0x7feabd902910, registryP=registryP@entry=0x249a7d0, methodName=0x7fe8e8001070 "daq.shutdown", 
    paramArrayP=0x7fe8e80028e0, callInfoP=callInfoP@entry=0x7feabd902a30, resultPP=resultPP@entry=0x7feabd902940) at registry.c:337
#47 0x00007feac2bc6f83 in xmlrpc_registry_process_call2 (envP=envP@entry=0x7feabd9029a0, registryP=0x249a7d0, 
    callXml=0x7fe8e8001460 "\r\ndaq.shutdown", callXmlLen=124, callInfo=callInfo@entry=0x7feabd902a30, responseXmlPP=responseXmlPP@entry=0x7feabd902990) at registry.c:426
#48 0x00007feac2fecf19 in xmlrpc_c::registry::processCall (this=0x7ffc3faa79d0, callXml=..., callInfoP=callInfoP@entry=0x7feabd902a30, 
    responseXmlP=responseXmlP@entry=0x7feabd902ac0) at registry.cpp:524
#49 0x00007feac33fd25b in xmlrpc_c::serverAbyss_impl::processCall (this=this@entry=0x249e100, call=..., abyssSessionP=abyssSessionP@entry=0x7feabd902c30, 
    responseP=responseP@entry=0x7feabd902ac0) at server_abyss.cpp:789
#50 0x00007feac33fd35b in xmlrpc_c::processXmlrpcCall (envP=0x7feabd902b80, arg=0x249e100, 
    callXml=0x7fe8e8001650 "\r\ndaq.shutdown", callXmlLen=124, abyssSessionP=0x7feabd902c30, responseXmlPP=0x7feabd902b78) at server_abyss.cpp:381
---Type  to continue, or q  to quit---
#51 0x00007feac31f55e7 in processCall (trace=, accessControl=..., wantChunk=false, xmlProcessorArg=0x249e100, 
    xmlProcessor=0x7feac33fd2b0 , 
    contentSize=, abyssSessionP=0x7feabd902c30) at abyss_handler.c:392
#52 handleXmlRpcCallReq (requestInfoP=, accessControl=..., wantChunk=, xmlProcessorArg=, 
    xmlProcessor=0x7feac33fd2b0 , 
    abyssSessionP=0x7feabd902c30) at abyss_handler.c:509
#53 xmlrpc_handleIfXmlrpcReq (handlerArg=, abyssSessionP=0x7feabd902c30, handledP=) at abyss_handler.c:573
#54 0x00007feac29ba39f in runUserHandler (srvP=0x249e1a0, sessionP=0x7feabd902c30) at server.c:631
#55 processRequestFromClient (connectionP=connectionP@entry=0x249e580, lastReqOnConn=lastReqOnConn@entry=false, timeout=, 
    keepAliveP=keepAliveP@entry=0x7feabd902d84) at server.c:724
#56 0x00007feac29ba5d0 in serverFunc (userHandle=0x249e580) at server.c:796
#57 0x00007feac29b444f in connJob (userHandle=0x249e580) at conn.c:39
#58 0x00007feac29bded2 in execute (arg=0x249e000) at thread_pthread.c:59
#59 0x00007feacd87ce25 in start_thread () from /lib64/libpthread.so.0
#60 0x00007feacb0ab34d in clone () from /lib64/libc.so.6
eflumerf commented 2 years ago

Comment by @bieryAtFnal on 2018-07-26 16:40:27


There may have been fewer crashes during this time frame because people were running the system less often because the timing system was being upgraded. Still, it's encouraging that there were only 3 EB core files. There were many more art core files. I only examined a few of those, and they seem to be due to SIGABRT, so I was guessing that they were forcible shutdowns when end-run took too long.