epics-modules / Dante

EPICS module for support of Dante digital pulse processors
1 stars 4 forks source link

List mode occasionally crashes #14

Closed MarkRivers closed 3 years ago

MarkRivers commented 3 years ago

List mode acquisition seems to work much of the time. However, it sometimes crashes with the following stack trace:

(gdb) bt
#0  0x00007f8b41ac0468 in std::__exception_ptr::exception_ptr::_M_release() () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#1  0x00007f8b418b642b in std::_Rb_tree<unsigned int, std::pair<unsigned int const, std::promise<std::vector<unsigned int, std::allocator<unsigned int> > > >, std::_Select1st<std::pair<unsigned int const, std::promise<std::vector<unsigned int, std::allocator<unsigned int> > > > >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::promise<std::vector<unsigned int, std::allocator<unsigned int> > > > > >::erase(unsigned int const&) () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#2  0x00007f8b418abc9a in xgl_ctrl_brd_dpp_chain::wait_data(unsigned short, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> >&) ()
   from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#3  0x00007f8b418a5e01 in xgl_ctrl_brd_dpp_chain::read(unsigned short, unsigned short, unsigned short, bool) () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#4  0x00007f8b41895e4c in xgl_ctrl_brd_dpp::is_running() () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#5  0x00007f8b4189f380 in xgl_ctrl_brd_dpp_chain::is_running(unsigned short) () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#6  0x00007f8b4185dc68 in std::_Function_handler<std::pair<unsigned short, std::vector<unsigned int, std::allocator<unsigned int> > > (), std::_Bind<std::pair<unsigned short, std::vector<unsigned int, std::allocator<unsigned int> > > (boards::*(std::shared_ptr<boards>, unsigned short))(unsigned short)> >::_M_invoke(std::_Any_data const&) ()
   from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#7  0x00007f8b41827c8a in AsyncTask::operator()() const () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#8  0x00007f8b4185de19 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<AsyncTask> >, void> >::_M_invoke(std::_Any_data const&) ()
   from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#9  0x00007f8b4180e88b in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#10 0x00007f8b4117fe70 in pthread_once () from /lib64/libpthread.so.0
#11 0x00007f8b4185f20a in _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZNSt13__future_base17_Async_state_implINS1_IS2_IJ9AsyncTaskEEEEvEC4EOS7_EUlvE_EEEEE6_M_runEv ()
   from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#12 0x00007f8b41b2e0ef in execute_native_thread_routine () from /home/epics/devel/dante-1-0/lib/linux-x86_64/libXGL_DPP.so.1
#13 0x00007f8b4117ae25 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f8b40033bad in clone () from /lib64/libc.so.6

These calls are all in a thread in the libXGL_DPP.so library, so it does not seem to be caused by the EPICS drive.

MarkRivers commented 3 years ago

After the above crashes happen the IOC can no longer be run again without power-cycling the Dante. This is very inconvenient.

lucagrittiniXGLAB commented 3 years ago

Hi Mark, We have seen that a possible issue that can cause the crash during List-Mode acquisition is the utilization of the DLL function "isRunning_Sytem()" to detect if the acqusition is terminated or not. This function has been substituted (a part from waveform acquisition mode) with the "isLastDataReceived()". The difference is that the previous function (isRunning_Sytem()) was asynchronous and then went to limit the throughput also in a critical way (in ListMode the throughput critical because of the burst of data coming from the DPP). This new function is synchronous and then doesn't need to communicate with DPP. You can see how to use this function in DANTE Library API manual and in "DPP_Test.cpp"

MarkRivers commented 3 years ago

List mode seems to be working with the new DLL and with the EPICS driver changed to call isLastDataReceived() rather than isRunning_system().