berthubert / galmon

galileo open source monitoring
GNU General Public License v3.0
198 stars 53 forks source link

watchdog timeout #123

Open edmonds opened 4 years ago

edmonds commented 4 years ago

Hi,

I have a receiver (https://galmon.eu/observer.html?observer=74) that will occasionally hang and stop generating serial data. This typically occurs a few times a week. When it happens kernel messages like these are generated:

[Sat Jun 27 17:11:16 2020] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[Mon Jun 29 22:10:58 2020] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32

This is on a Raspberry Pi 4 and I don't see the same issue on my desktop, so it's probably a hardware issue.

Closing and re-opening the serial port appears to clear the problem. My current workaround is to restart the ubxtool process every hour or so (e.g. RuntimeMaxSec=3600 in the systemd service unit), but I'm wondering if it might make more sense to have a watchdog option in the ubxtool process itself that exits the main loop if a message hasn't been received in a reasonable amount of time (e.g., 5–10 minutes?).

Thanks for considering!

edmonds commented 3 years ago

FWIW, it looks like my issue is caused by https://github.com/raspberrypi/linux/issues/2406, which appears to be either a bug in the kernel's ftdi driver on ARM platforms, or a bug in the USB controller on Raspberry Pi systems. I swapped out my Raspberry Pi 4 for a 3B+ and the ftdi hangs started occurring every few minutes. Adding dwc_otg.speed=1 to the kernel command line appears to fix the issue but limits the USB controller to USB 1.1 speeds.