D-TACQ / AFHBA404

Linux host device driver for AFHBA404 and MTCA PCIe
GNU General Public License v2.0
3 stars 8 forks source link

Rebbooting for recovery fails on stuck stream #124

Closed zack-vii closed 1 month ago

zack-vii commented 1 month ago

Sometimes when the stream gets stuck or crashes on a acq400 carrier. It seems to be te best to reboot the system, e.g. ssh in and reboot if that is still possible of with the REBOOT knob on site 0. However, I find myself with a now stuck system that is trapped in an infinite loop in acq400_drv.c _acq420_continuous_dma_stop(). So with the usb terminal i read the message "WAITING for work task" periodically. Since the login level already shut down there is no way of intervention that I know of and the device is rendered nonoperational until the next physical access (due to space limitation we could not fit remote controlled power strips everywhere). Is it possible to add a timeout or something that breaks out of the loop upon reboot?

zack-vii commented 1 month ago

One solution could be to detect a stucked dma, e.g. by adding a counter to adev that is incremented on each cycle of the for loop in axi64_dual_data_loop(). _acq420_continuous_dma_stop() may check if the counter has changes after each 1sec timeout. if not it can assume the stream is stuck or dead. it could flag abort and continue. The stream in axi64_dual_data_loop may not like it but has the chance to do a safe abort when detecting a set abort flag. The abort flag could be cleared when entering axi64_dual_data_loop.

zack-vii commented 1 month ago

wrong project