Open dariks opened 1 year ago
Link to Screen capture of the issue: https://rippen-cloud.de/index.php/s/XK3RztbWmMWHQDN
Screenshot of this issue:
Screenshot of Grafana Monitoring when this issue occurs:
Thanks for the report.
Is tunneldigger retrying more frequently than every 5 seconds as printed in the logs, or does it cause such high load even with when only retrying every 5 seconds? What is the load average when you stop tunneldigger (/etc/init.d/tunneldigger stop
?
Thank you for taking your time to create such an useful tool.
It doesn't seem to run more often than in the 5 second interval. It rather seems as if the "trying to connect - erroring out ("Failed to connect to remote endpoint - check WAN connectivity!") - Reinitializing tunnel context - trying to connect next gateway" process is running way too quick and causing the load (take a look at the attached link to see a video of the issue.). It seems as if there should be some kind of rate limit at which the Tunneldigger should connect.
I can confirm running /etc/init.d/tunneldigger stop
mitigates the high load average for a short period of time (Tunneldigger gets restartet automatically, probably by the Tunneldigger-Watchdog).
The 1min load average dropped to 0.02 after 3 minutes of running with stopped Tunneldigger.
Bug report
What is the problem? With enabled Tunneldigger but no WAN connection (e.g. Cable is disconnected on WAN) the node is experiencing high CPU Load Average (1.5-2.0) because of Tunneldigger's effort to restart and reconnect. This also causes super high flowrate of logs. At the moment we have recreated this issue on TP-Link Archer C7 v2, Archer C7 v5 and Xiaomi Mi 4a Gigabit, so this seems to be independent of the device.
Log output (this log absolutely SPAMS logread...):
Devices with said issues: https://map.freifunk-nordhessen.de/#!/de/map/60e32727c0fe https://map.freifunk-nordhessen.de/#!/de/map/8cdef9a91277
What is the expected behaviour? Tunneldigger should retry, but it could make sense to reduce the number of retries after X amount of tries (e.g. every 5 minutes after 10 fails).
Gluon Version: Gluon v2023.1.1 seems like Commit ID 7feebe96a6b93d567bd342ffd01abcb3022383c5
Site Configuration: https://gitlab.com/freifunk-nordhessen/ffnh-site
Custom patches: no custom patches applied to Tunneldigger and/or dnsmasq-wan.