SignalK / signalk-server

An implementation of a Signal K central server for boats.
http://signalk.org
Apache License 2.0
308 stars 153 forks source link

NGT-1/Canboatjs stops receiving data after a few minutes #1561

Open mgrouch opened 1 year ago

mgrouch commented 1 year ago

Canboat-js seems stops reading from NMEA 2000 after few minutes.

Restarting just SignalK server restores data flow from NMEA 2000 to SignalK. After few mins data stops refreshing again.

Reproducible with Actisense NGT-1 source type for NMEA 2000 connection.

tkurki commented 1 year ago

What hardware is this? Is this related to flaky usb connections?

mgrouch commented 1 year ago

Yes. We had long troubleshooting session on slack yesterday with Scott Bender. With number of things uncovered

mgrouch commented 1 year ago

The findings:

SignalK needs keep alive mechanisms on all connections it receives data from. TCP or serial.

They are file descriptors which can get stale but not-closed. In both cases it leads to SignalK degradation.

In case of TCP it happens when TCP client is powered off without sending tcp close packets. So on SignalK connection is open. It’s possible to bring SignalK to a complete stall with 100% CPU with client behaving like that.

In case of Serial. Unplugging and plugging some usb devices makes Linux rebuild devices files on /dev file system. Even if the name is kept SignalK is running into situation where it keeps stale file descriptor of a device in /dev, after device file was rebuilt by kernel. SignalK doesn’t recognize that it needs to close and reopen that device.

There are many scenarios when it seems happening, bad usb hardware, bad electrical contacts on usb, starting engine with surge of voltage, user’s plugging and unplugging additional usb devices at runtime.

Issue exists for all serial USB connections NMEA 2000 or NMEA 0183.

Solution: All these connections feed live data with small (subsecond) intervals. So keepalive mechanism can work like this:

On each connection measure time between last received record and current time. If it’s longer than let’s say 15 seconds it means connection is stale (for whatever reason, doesn’t matter). Close underlying file descriptor for that connection and open a new one. For any tcp connection or any usb serial.

This will solve issues users experience underway which require manual restarts of SignalK in current implementation.

thanks

tkurki commented 1 year ago

No, the issue does not exist for all serial USB connections.

Please check if canboat release https://github.com/canboat/canboatjs/releases/tag/v1.27.2 fixes this issue.