axiondarkmatterexperiment / fast_daq

software for real-time data collection and processing.
0 stars 0 forks source link

crash with DmaBusy on AsyncAbort call #1

Open laroque opened 5 years ago

laroque commented 5 years ago

Possibly a digitizer library issue? I seem to be getting a ApiDmaInProgress error while calling AlazarAbortAsyncRead ... This doesn't make sense, the abort call is supposed to stop the "InProgress" nature of the buffers, the documentation indicates that this error comes from trying to READ a buffer that is not the correct element in the buffer FIFO.

ats_1              | 2019-03-02 20:40:07 [ INFO] (tid 140603508070144) l/daq_control.cc(437): Run was stopped manually
ats_1              | 2019-03-02 20:40:07 [DEBUG] (tid 140603508070144) l/daq_control.cc(442): Finishing egg files
ats_1              | 2019-03-02 20:40:07 [DEBUG] (tid 140603508070144) ty/cancelable.cc(28): cancelable::do_cancellation
ats_1              | 2019-03-02 20:40:07 [ INFO] (tid 140603491284736) monarch3_wrap.cc(101): Monarch-on-deck manager is stopping
ats_1              | 2019-03-02 20:40:07 [ERROR] 462_digitizer.cc(173): Error: AlazarAbortAsyncRead failed -- ApiDmaInProgress
ats_1              | 2019-03-02 20:40:07 [ERROR] ity/terminate.cc(54): Caught unknown (non-std::exception) & unhandled exception.
ats_1              | 2019-03-02 20:40:07 [ERROR] ity/terminate.cc(61): Backtrace from terminate() returned 6 frames
ats_1              |
ats_1              | 2019-03-02 20:40:07 [ERROR] ity/terminate.cc(70): Backtrace:
ats_1              | [bt]: (0) /usr/local/lib/libScarab_Midge_Psyllid_FastDaq.so(_ZN6scarab9terminateEv+0x50) [0x7fe10e1b4cb4]
ats_1              | [bt]: (1) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8f066) [0x7fe10badd066]
ats_1              | [bt]: (2) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8f0b1) [0x7fe10badd0b1]
ats_1              | [bt]: (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9e9e) [0x7fe10bb07e9e]
ats_1              | [bt]: (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x7fe10f8164a4]
ats_1              | [bt]: (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fe10b27cd0f]
ats_1              |
ats_1              | (tid 140603566819072) (tid 140603566819072) (tid 140603566819072) (tid 140603566819072) atsdigitizer_ats_1 exited with code 139

I should contact Alazar about this, maybe there's a flaw in their library or incompleteness in their documentation on the call and these errors.

In parallel, I should clean up the interface to calls to their library. The current method ats9462_digitizer::check_return_code takes the result of a call to the API and a string which is supposed to match the function name. It would be much better if:

  1. The name were derived by template or some other meta-programming pattern
  2. The check_return_code call were able to give the line number from which it has been called
  3. There were some sort of blocking/locking logic that ensured that we make each API call and get a return code from it before we make the next call. I don't think this race should be happening because the node is all in one thread, but it happened during the abort run and I'm not entirely clear if maybe midge or something else owns a thread that could make another call getting me to a race.
laroque commented 5 years ago

For reference, the method I'd like to refactor is here: https://github.com/axiondarkmatterexperiment/fast_daq/blob/6cd0a1e4097bbe919865e5d7d76f89f50a8bc7c9/source/daq/ATS9462_digitizer.cc#L165

And an example of use is here: https://github.com/axiondarkmatterexperiment/fast_daq/blob/6cd0a1e4097bbe919865e5d7d76f89f50a8bc7c9/source/daq/ATS9462_digitizer.cc#L182