Ysurac / openmptcprouter

OpenMPTCProuter is an open source solution to aggregate multiple internet connections using Multipath TCP (MPTCP) on OpenWrt
https://www.openmptcprouter.com/
GNU General Public License v3.0
1.85k stars 266 forks source link

omr-service isn't restarting the correct dsvpn service after disconnection. #3010

Closed ioogithub closed 9 months ago

ioogithub commented 1 year ago

Expected Behavior

omr-service will restart the correct dsvpn service.

Current Behavior

omr-service is not able to restart the correct dsvpn service.

Possible Solution

omr-service should be changed.

From the log:

Oct 24 10:27:09 vpstest OMR-Service[2862358]: No answer from VPN client end, restart DSVPN
Oct 24 10:27:09 vpstest omr-service[2862359]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 10:28:25 vpstest OMR-Service[2864633]: No answer from VPN client end, restart DSVPN
Oct 24 10:28:25 vpstest omr-service[2864634]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 10:28:56 vpstest OMR-Service[2865896]: No answer from VPN client end, restart DSVPN
Oct 24 10:28:56 vpstest omr-service[2865897]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 10:29:12 vpstest OMR-Service[2866330]: No answer from VPN client end, restart DSVPN
Oct 24 10:29:12 vpstest omr-service[2866331]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 10:29:28 vpstest OMR-Service[2866787]: No answer from VPN client end, restart DSVPN
Oct 24 10:29:28 vpstest omr-service[2866788]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 10:29:28 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 10:29:30 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 10:29:32 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 10:29:36 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 10:29:40 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 10:29:44 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 10:29:44 vpstest OMR-Service[2867264]: No answer from VPN client end, restart DSVPN
Oct 24 10:29:44 vpstest omr-service[2867265]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 10:29:48 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 11:45:30 vpstest OMR-Service[3006423]: No answer from VPN client end, restart DSVPN
Oct 24 11:45:30 vpstest omr-service[3006424]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 11:45:46 vpstest OMR-Service[3007055]: No answer from VPN client end, restart DSVPN
Oct 24 11:45:46 vpstest omr-service[3007056]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 11:46:02 vpstest OMR-Service[3007639]: No answer from VPN client end, restart DSVPN
Oct 24 11:46:02 vpstest omr-service[3007640]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 11:46:33 vpstest OMR-Service[3008923]: No answer from VPN client end, restart DSVPN
Oct 24 11:46:33 vpstest omr-service[3008924]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.

There is another message, not sure if its related:

Configuration file /lib/systemd/system/dsvpn-server@.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.

The correct service is dsvpn@dsvpn0.service should be changed to dsvpn-server@dsvpn0.service

It also looks like dsvpn-run struggles to establish a connection as well:

Oct 24 12:18:58 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:19:14 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:19:29 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:19:41 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:19:56 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:20:11 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:20:27 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:20:42 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:20:57 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:21:12 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:21:27 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:21:42 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:21:58 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 12:21:58 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy

Context (Environment)

This problem creates stability issues.

I switched from glorytun to dsvpn because glorytun doesn't recover from connection problems, sometimes it doesn't recover at all and the server remains unstable for 12 hours until I manually reboot the router. It looks like dsvpn also has recovery issues.

Ysurac which VPN do your recommend for stability? Is there any VPN that is fault tolerant can recover from connection issues without rebooting omr?

I am using v2ray for the proxy and it is stable. The VPN doesn't do very much work but it constantly takes down OMR. I need to find an OMR VPN configuration that can recover from connection issues. If there are two WANs then OMR should be able to switch to the other wan without crashing the whole system. The problem is after a connection issue, the scripts can't recover omr I always have to reboot and after that it works. I need a solution that doesn't involve manually rebooting the router. Please recommend a solution.

Are there any VPS available that can maintain a connection to multiple WANs for redundancy this was if one wan has a problem it doesn't take down the whole router?

Specifications

ioogithub commented 1 year ago

Here is another instance, VPN tunnel down for 1 hour, omr cannot restore the conneciton:

Oct 24 13:40:52 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:40:54 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:40:56 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:40:59 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:03 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:07 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:12 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:16 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:20 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:24 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:28 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:32 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:36 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 13:41:40 vpstest dsvpn-run[687]: Accepting a new client failed: Device or resource busy
Oct 24 14:33:45 vpstest OMR-Service[3571189]: No answer from VPN client end, restart DSVPN
Oct 24 14:33:45 vpstest omr-service[3571190]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:33:45 vpstest dsvpn-run[687]: Interface: [dsvpn0]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Listening to 0.0.0.0:65401
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [unknown ip (attacker?)]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.42]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.55]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.57]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.67]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.64]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.70]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.43]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.60]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.44]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.52]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.63]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.50]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [87.236.176.50]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan1]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:33:45 vpstest dsvpn-run[687]: Connection attempt from [wan2]
Oct 24 14:33:45 vpstest dsvpn-run[687]: Client disconnected
Oct 24 14:41:15 vpstest OMR-Service[3595736]: No answer from VPN client end, restart DSVPN
Oct 24 14:41:15 vpstest omr-service[3595737]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:44:58 vpstest kernel: TCP: request_sock_MPTCP: Possible SYN flooding on port 65401. Sending cookies.  Check SNMP counters.
Oct 24 14:47:01 vpstest OMR-Service[3608149]: No answer from VPN client end, restart DSVPN
Oct 24 14:47:01 vpstest omr-service[3608150]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:47:17 vpstest OMR-Service[3608833]: No answer from VPN client end, restart DSVPN
Oct 24 14:47:17 vpstest omr-service[3608834]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:47:33 vpstest OMR-Service[3609766]: No answer from VPN client end, restart DSVPN
Oct 24 14:47:33 vpstest omr-service[3609767]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:48:04 vpstest OMR-Service[3611122]: No answer from VPN client end, restart DSVPN
Oct 24 14:48:04 vpstest omr-service[3611123]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:48:20 vpstest OMR-Service[3612261]: No answer from VPN client end, restart DSVPN
Oct 24 14:48:20 vpstest omr-service[3612263]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:49:37 vpstest OMR-Service[3615326]: No answer from VPN client end, restart DSVPN
Oct 24 14:49:37 vpstest omr-service[3615327]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:49:53 vpstest OMR-Service[3615811]: No answer from VPN client end, restart DSVPN
Oct 24 14:49:53 vpstest omr-service[3615812]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:50:08 vpstest OMR-Service[3616250]: No answer from VPN client end, restart DSVPN
Oct 24 14:50:08 vpstest omr-service[3616251]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:50:24 vpstest OMR-Service[3616704]: No answer from VPN client end, restart DSVPN
Oct 24 14:50:24 vpstest omr-service[3616705]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:50:40 vpstest OMR-Service[3617203]: No answer from VPN client end, restart DSVPN
Oct 24 14:50:40 vpstest omr-service[3617204]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:50:56 vpstest OMR-Service[3618318]: No answer from VPN client end, restart DSVPN
Oct 24 14:50:56 vpstest omr-service[3618319]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:51:12 vpstest OMR-Service[3619179]: No answer from VPN client end, restart DSVPN
Oct 24 14:51:12 vpstest omr-service[3619181]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:51:28 vpstest OMR-Service[3619637]: No answer from VPN client end, restart DSVPN
Oct 24 14:51:28 vpstest omr-service[3619639]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:51:44 vpstest OMR-Service[3620153]: No answer from VPN client end, restart DSVPN
Oct 24 14:51:44 vpstest omr-service[3620154]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:52:00 vpstest OMR-Service[3620680]: No answer from VPN client end, restart DSVPN
Oct 24 14:52:00 vpstest omr-service[3620681]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:52:16 vpstest OMR-Service[3621191]: No answer from VPN client end, restart DSVPN
Oct 24 14:52:16 vpstest omr-service[3621192]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:52:32 vpstest OMR-Service[3621663]: No answer from VPN client end, restart DSVPN
Oct 24 14:52:32 vpstest omr-service[3621664]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:52:47 vpstest OMR-Service[3622182]: No answer from VPN client end, restart DSVPN
Oct 24 14:52:47 vpstest omr-service[3622183]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:53:03 vpstest OMR-Service[3622678]: No answer from VPN client end, restart DSVPN
Oct 24 14:53:03 vpstest omr-service[3622679]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:53:19 vpstest OMR-Service[3623194]: No answer from VPN client end, restart DSVPN
Oct 24 14:53:19 vpstest omr-service[3623195]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:53:35 vpstest OMR-Service[3623702]: No answer from VPN client end, restart DSVPN
Oct 24 14:53:35 vpstest omr-service[3623703]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:53:51 vpstest OMR-Service[3624222]: No answer from VPN client end, restart DSVPN
Oct 24 14:53:51 vpstest omr-service[3624223]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:54:07 vpstest OMR-Service[3624687]: No answer from VPN client end, restart DSVPN
Oct 24 14:54:07 vpstest omr-service[3624688]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:54:23 vpstest OMR-Service[3625391]: No answer from VPN client end, restart DSVPN
Oct 24 14:54:23 vpstest omr-service[3625392]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:54:39 vpstest OMR-Service[3626771]: No answer from VPN client end, restart DSVPN
Oct 24 14:54:39 vpstest omr-service[3626772]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:54:55 vpstest OMR-Service[3627672]: No answer from VPN client end, restart DSVPN
Oct 24 14:54:55 vpstest omr-service[3627673]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:55:11 vpstest OMR-Service[3628479]: No answer from VPN client end, restart DSVPN
Oct 24 14:55:11 vpstest omr-service[3628480]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:55:27 vpstest OMR-Service[3629356]: No answer from VPN client end, restart DSVPN
Oct 24 14:55:27 vpstest omr-service[3629357]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:55:43 vpstest OMR-Service[3630238]: No answer from VPN client end, restart DSVPN
Oct 24 14:55:43 vpstest omr-service[3630239]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:55:59 vpstest OMR-Service[3631135]: No answer from VPN client end, restart DSVPN
Oct 24 14:55:59 vpstest omr-service[3631136]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:56:15 vpstest OMR-Service[3632012]: No answer from VPN client end, restart DSVPN
Oct 24 14:56:15 vpstest omr-service[3632013]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:56:31 vpstest OMR-Service[3632817]: No answer from VPN client end, restart DSVPN
Oct 24 14:56:31 vpstest omr-service[3632818]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:56:47 vpstest OMR-Service[3633707]: No answer from VPN client end, restart DSVPN
Oct 24 14:56:47 vpstest omr-service[3633708]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:57:02 vpstest OMR-Service[3634601]: No answer from VPN client end, restart DSVPN
Oct 24 14:57:02 vpstest omr-service[3634602]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:57:18 vpstest OMR-Service[3635484]: No answer from VPN client end, restart DSVPN
Oct 24 14:57:18 vpstest omr-service[3635485]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:57:30 vpstest glorytun-tcp-run[15898]: ::ffff:194.165.16.37.65236: connected
Oct 24 14:57:34 vpstest OMR-Service[3636324]: No answer from VPN client end, restart DSVPN
Oct 24 14:57:34 vpstest omr-service[3636326]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:57:39 vpstest glorytun-tcp-run[15898]: read: Connection reset by peer
Oct 24 14:57:39 vpstest glorytun-tcp-run[15898]: ::ffff:194.165.16.37.65236: key exchange failed
Oct 24 14:57:50 vpstest OMR-Service[3637207]: No answer from VPN client end, restart DSVPN
Oct 24 14:57:50 vpstest omr-service[3637208]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:58:06 vpstest OMR-Service[3638072]: No answer from VPN client end, restart DSVPN
Oct 24 14:58:06 vpstest omr-service[3638073]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:58:22 vpstest OMR-Service[3638950]: No answer from VPN client end, restart DSVPN
Oct 24 14:58:22 vpstest omr-service[3638951]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 14:58:38 vpstest OMR-Service[3639827]: No answer from VPN client end, restart DSVPN
Oct 24 14:58:38 vpstest omr-service[3639828]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:03:56 vpstest OMR-Service[3657557]: No answer from VPN client end, restart DSVPN
Oct 24 15:03:56 vpstest omr-service[3657558]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:04:12 vpstest OMR-Service[3658431]: No answer from VPN client end, restart DSVPN
Oct 24 15:04:12 vpstest omr-service[3658432]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:04:28 vpstest OMR-Service[3659305]: No answer from VPN client end, restart DSVPN
Oct 24 15:04:28 vpstest omr-service[3659306]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:04:45 vpstest OMR-Service[3660231]: No answer from VPN client end, restart DSVPN
Oct 24 15:04:45 vpstest omr-service[3660232]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:05:01 vpstest OMR-Service[3661081]: No answer from VPN client end, restart DSVPN
Oct 24 15:05:01 vpstest omr-service[3661082]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:05:17 vpstest OMR-Service[3661932]: No answer from VPN client end, restart DSVPN
Oct 24 15:05:17 vpstest omr-service[3661934]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:05:32 vpstest OMR-Service[3662822]: No answer from VPN client end, restart DSVPN
Oct 24 15:05:32 vpstest omr-service[3662823]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:05:49 vpstest OMR-Service[3663744]: No answer from VPN client end, restart DSVPN
Oct 24 15:05:49 vpstest omr-service[3663745]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:06:04 vpstest OMR-Service[3664613]: No answer from VPN client end, restart DSVPN
Oct 24 15:06:04 vpstest omr-service[3664614]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:06:20 vpstest OMR-Service[3665461]: No answer from VPN client end, restart DSVPN
Oct 24 15:06:20 vpstest omr-service[3665462]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:06:36 vpstest OMR-Service[3666294]: No answer from VPN client end, restart DSVPN
Oct 24 15:06:36 vpstest omr-service[3666295]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:06:52 vpstest OMR-Service[3667207]: No answer from VPN client end, restart DSVPN
Oct 24 15:06:52 vpstest omr-service[3667208]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:07:08 vpstest OMR-Service[3668076]: No answer from VPN client end, restart DSVPN
Oct 24 15:07:08 vpstest omr-service[3668077]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:07:24 vpstest OMR-Service[3668950]: No answer from VPN client end, restart DSVPN
Oct 24 15:07:24 vpstest omr-service[3668951]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:07:40 vpstest OMR-Service[3669744]: No answer from VPN client end, restart DSVPN
Oct 24 15:07:40 vpstest omr-service[3669745]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:07:56 vpstest OMR-Service[3670656]: No answer from VPN client end, restart DSVPN
Oct 24 15:07:56 vpstest omr-service[3670657]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:08:12 vpstest OMR-Service[3671573]: No answer from VPN client end, restart DSVPN
Oct 24 15:08:12 vpstest omr-service[3671574]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:08:28 vpstest OMR-Service[3672448]: No answer from VPN client end, restart DSVPN
Oct 24 15:08:28 vpstest omr-service[3672449]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:08:44 vpstest OMR-Service[3673338]: No answer from VPN client end, restart DSVPN
Oct 24 15:08:44 vpstest omr-service[3673339]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:09:00 vpstest OMR-Service[3674159]: No answer from VPN client end, restart DSVPN
Oct 24 15:09:00 vpstest omr-service[3674161]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:09:16 vpstest OMR-Service[3675044]: No answer from VPN client end, restart DSVPN
Oct 24 15:09:16 vpstest omr-service[3675045]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:09:32 vpstest OMR-Service[3676381]: No answer from VPN client end, restart DSVPN
Oct 24 15:09:32 vpstest omr-service[3676382]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:09:48 vpstest OMR-Service[3677317]: No answer from VPN client end, restart DSVPN
Oct 24 15:09:48 vpstest omr-service[3677318]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:10:03 vpstest OMR-Service[3678162]: No answer from VPN client end, restart DSVPN
Oct 24 15:10:03 vpstest omr-service[3678163]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:10:20 vpstest OMR-Service[3679039]: No answer from VPN client end, restart DSVPN
Oct 24 15:10:20 vpstest omr-service[3679040]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:10:36 vpstest OMR-Service[3679932]: No answer from VPN client end, restart DSVPN
Oct 24 15:10:36 vpstest omr-service[3679933]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:10:52 vpstest OMR-Service[3680845]: No answer from VPN client end, restart DSVPN
Oct 24 15:10:52 vpstest omr-service[3680846]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:11:08 vpstest OMR-Service[3681693]: No answer from VPN client end, restart DSVPN
Oct 24 15:11:08 vpstest omr-service[3681694]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:11:24 vpstest OMR-Service[3682536]: No answer from VPN client end, restart DSVPN
Oct 24 15:11:24 vpstest omr-service[3682537]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:11:40 vpstest OMR-Service[3683377]: No answer from VPN client end, restart DSVPN
Oct 24 15:11:40 vpstest omr-service[3683378]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:11:56 vpstest OMR-Service[3684282]: No answer from VPN client end, restart DSVPN
Oct 24 15:11:56 vpstest omr-service[3684283]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:12:12 vpstest OMR-Service[3685165]: No answer from VPN client end, restart DSVPN
Oct 24 15:12:12 vpstest omr-service[3685166]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:12:28 vpstest OMR-Service[3686038]: No answer from VPN client end, restart DSVPN
Oct 24 15:12:28 vpstest omr-service[3686039]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
Oct 24 15:12:43 vpstest OMR-Service[3686907]: No answer from VPN client end, restart DSVPN
Oct 24 15:12:43 vpstest omr-service[3686908]: Failed to restart dsvpn@dsvpn0.service: Unit dsvpn@dsvpn0.service not found.
...

It just keeps trying but the service is is trying to restart does not exist. it is never successful. The only thing that works is rebooting OMR. Is this a known issue?

ioogithub commented 1 year ago

I am looking at ome.service on vps. I replaced dsvpn@dsvpn0.service with dsvpn-server@dsvpn0.service in omr-service and restarted systemctl restart omr.service and I get these errors, are these normal or indications of further problems with this service:

omr.service - OMR Loaded: loaded (/lib/systemd/system/omr.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2023-10-24 16:01:37 EDT; 57s ago Main PID: 750 (omr-service) Tasks: 2 (limit: 1153) Memory: 7.7M CPU: 3.903s CGroup: /system.slice/omr.service ├─ 750 /bin/bash /usr/local/bin/omr-service └─8894 sleep 10

Oct 24 16:01:37 vpstest systemd[1]: Started OMR. Oct 24 16:01:37 vpstest omr-service[752]: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory Oct 24 16:01:37 vpstest omr-service[752]: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established: No such file or directory Oct 24 16:01:37 vpstest omr-service[752]: sysctl: cannot stat /proc/sys/net/mptcp/checksum_enabled: No such file or directory Oct 24 16:01:37 vpstest omr-service[753]: modprobe: ERROR: could not insert 'bonding': Operation not permitted

ioogithub commented 1 year ago

okay so I don't think that the problem is with my wan connection. I think the problem is with the omr-tracker script.

Here is the start of a problem from logread:

Tue Oct 24 16:35:21 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (8.8.8.8,80.67.169.12,8.8.4.4)
Tue Oct 24 16:35:21 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via wan1 dev eth1
Tue Oct 24 16:35:26 2023 user.notice post-tracking-post-tracking: Restart wan1

I ran 3 pings on this wan1 interface. This should be the same thing the tracker is doing right?

ping -D -I wan1ip 80.67.169.12 | tee -a log1.log ping -D -I wan1ip 4.2.2.1 | tee -a log2.log ping -D -I wan1ip 9.9.9.9 | tee -a log3.log

Here is the result:

[4:34:55.907 PM] 64 bytes from 80.67.169.12: icmp_seq=2394 ttl=52 time=120 ms
[1698179696.903292] 64 bytes from 80.67.169.12: icmp_seq=2395 ttl=52 time=116 ms
[1698179697.909611] 64 bytes from 80.67.169.12: icmp_seq=2396 ttl=52 time=123 ms
[1698179698.904454] 64 bytes from 80.67.169.12: icmp_seq=2397 ttl=52 time=118 ms
[1698179699.903385] 64 bytes from 80.67.169.12: icmp_seq=2398 ttl=52 time=116 ms
[1698179700.903509] 64 bytes from 80.67.169.12: icmp_seq=2399 ttl=52 time=117 ms
[1698179701.905665] 64 bytes from 80.67.169.12: icmp_seq=2400 ttl=52 time=119 ms
[1698179702.906843] 64 bytes from 80.67.169.12: icmp_seq=2401 ttl=52 time=120 ms
[1698179703.903073] 64 bytes from 80.67.169.12: icmp_seq=2402 ttl=52 time=116 ms
[1698179704.910532] 64 bytes from 80.67.169.12: icmp_seq=2403 ttl=52 time=124 ms
[1698179745.111394] 64 bytes from 80.67.169.12: icmp_seq=2441 ttl=48 time=875 ms
[1698179745.451261] 64 bytes from 80.67.169.12: icmp_seq=2442 ttl=48 time=214 ms
[1698179746.407972] 64 bytes from 80.67.169.12: icmp_seq=2443 ttl=48 time=170 ms
[1698179747.364099] 64 bytes from 80.67.169.12: icmp_seq=2444 ttl=51 time=124 ms
[1698179748.363557] 64 bytes from 80.67.169.12: icmp_seq=2445 ttl=51 time=122 ms
[1698179749.359364] 64 bytes from 80.67.169.12: icmp_seq=2446 ttl=51 time=117 ms
[1698179750.367315] 64 bytes from 80.67.169.12: icmp_seq=2447 ttl=51 time=124 ms
[1698179751.363187] 64 bytes from 80.67.169.12: icmp_seq=2448 ttl=51 time=118 ms
[1698179786.149096] 64 bytes from 80.67.169.12: icmp_seq=2481 ttl=52 time=751 ms
[1698179786.526544] 64 bytes from 80.67.169.12: icmp_seq=2482 ttl=52 time=128 ms
[1698179787.517176] 64 bytes from 80.67.169.12: icmp_seq=2483 ttl=52 time=118 ms
[1698179788.529526] 64 bytes from 80.67.169.12: icmp_seq=2484 ttl=52 time=128 ms
[1698179789.534988] 64 bytes from 80.67.169.12: icmp_seq=2485 ttl=52 time=132 ms
[1698179790.527605] 64 bytes from 80.67.169.12: icmp_seq=2486 ttl=52 time=124 ms
[4:36:31.539 PM] 64 bytes from 80.67.169.12: icmp_seq=2487 ttl=52 time=134 ms

convert the first and last time period epoch time, my pings never missed a ping on this wan interface at the time that omr-tracker said it can't ping:

Tue Oct 24 16:35:21 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (8.8.8.8,80.67.169.12,8.8.4.4).

Same results with the ping from 4.2.2.1 and 9.9.9.9. I didn't drop any pings at 16:35:21

So why is omr-tracker saying there is missing pings and restarting the wan interface and killing the router when there is no ping issue? I never saw a single dropped ping.

My settings for omr-tracker default are:

config defaults 'defaults'
        option enabled '1'
        list hosts '4.2.2.1'
        list hosts '8.8.8.8'
        list hosts '80.67.169.12'
        list hosts '8.8.4.4'
        list hosts '9.9.9.9'
        list hosts '1.0.0.1'
        list hosts '114.114.115.115'
        list hosts '1.2.4.8'
        list hosts '80.67.169.40'
        list hosts '114.114.114.114'
        list hosts '1.1.1.1'
        list hosts6 '2606:4700:4700::1111'
        list hosts6 '2606:4700:4700::1001'
        list hosts6 '2620:fe::fe'
        list hosts6 '2620:fe::9'
        list hosts6 '2001:4860:4860::8888'
        list hosts6 '2001:4860:4860::8844'
        option timeout '2'
        option tries '3'
        option interval '2'
        option interval_tries '1'
        option type 'ping'
        option wait_test '0'
        option server_http_test '0'
        option mail_alert '1'
        option restart_down '1'

So timeout 2 second, tries 2, retries 3, but my pings didn't drop a single ping to the same 80.67.169.12 address during the time that omr-tracker said count not ping. What is happening here?

It will keep doing this 2 or 3 times every hours and it never recovers.

ioogithub commented 1 year ago

Same thing:

Tue Oct 24 17:28:38 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (1.1.1.1,4.2.2.1,8.8.8.8)
Tue Oct 24 17:28:38 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via wan1gatewayip dev eth1
Tue Oct 24 17:28:42 2023 user.notice OMR-VPS: Can't get vps token, try later (can ping server vps on vpsip, server API answer on vpsip)
Tue Oct 24 17:28:44 2023 user.notice post-tracking-post-tracking: Restart wan1

I also added a new ping to the wan1gatewayip because of this statement:

Always ping gateway, then test connection by ping, httping or dns. None mode only ping gateway.

4 bytes from wan1gatewayip: icmp_seq=1079 ttl=64 time=1.77 ms
64 bytes from wan1gatewayip: icmp_seq=1080 ttl=64 time=2.05 ms
64 bytes from wan1gatewayip: icmp_seq=1081 ttl=64 time=1.79 ms
64 bytes from wan1gatewayip: icmp_seq=1082 ttl=64 time=1.59 ms
64 bytes from wan1gatewayip: icmp_seq=1083 ttl=64 time=1.99 ms
64 bytes from wan1gatewayip: icmp_seq=1084 ttl=64 time=1.65 ms
64 bytes from wan1gatewayip: icmp_seq=1085 ttl=64 time=1.98 ms
64 bytes from wan1gatewayip: icmp_seq=1086 ttl=64 time=1.90 ms
64 bytes from wan1gatewayip: icmp_seq=1087 ttl=64 time=1.50 ms
64 bytes from wan1gatewayip: icmp_seq=1088 ttl=64 time=1.85 ms
64 bytes from wan1gatewayip: icmp_seq=1089 ttl=64 time=1.40 ms
64 bytes from wan1gatewayip: icmp_seq=1090 ttl=64 time=1.90 ms

I never miss a ping when post-tracker is saying it can't ping and it shuts off the interface which causes all sort of problems because it doesn't always reconnect properly.

How is omr-tracker detecting it is dropping pings and killing the interface when I never drop pings?

Ysurac commented 1 year ago

The ping used is ping -B -I eth1 1.1.1.1. In your first log, there is 2 big drop:

[1698179704.910532] 64 bytes from 80.67.169.12: icmp_seq=2403 ttl=52 time=124 ms
[1698179745.111394] 64 bytes from 80.67.169.12: icmp_seq=2441 ttl=48 time=875 ms

and

[1698179751.363187] 64 bytes from 80.67.169.12: icmp_seq=2448 ttl=51 time=118 ms
[1698179786.149096] 64 bytes from 80.67.169.12: icmp_seq=2481 ttl=52 time=751 ms
ioogithub commented 1 year ago

The ping used is ping -B -I eth1 1.1.1.1. In your first log, there is 2 big drop:

[1698179704.910532] 64 bytes from 80.67.169.12: icmp_seq=2403 ttl=52 time=124 ms
[1698179745.111394] 64 bytes from 80.67.169.12: icmp_seq=2441 ttl=48 time=875 ms

and

[1698179751.363187] 64 bytes from 80.67.169.12: icmp_seq=2448 ttl=51 time=118 ms
[1698179786.149096] 64 bytes from 80.67.169.12: icmp_seq=2481 ttl=52 time=751 ms

There are dips but there are no drop pings or missed pings, after the dip the next two pings are back to normal:

[1698179745.451261] 64 bytes from 80.67.169.12: icmp_seq=2442 ttl=48 time=214 ms
[1698179746.407972] 64 bytes from 80.67.169.12: icmp_seq=2443 ttl=48 time=170 ms
[1698179747.364099] 64 bytes from 80.67.169.12: icmp_seq=2444 ttl=51 time=124 ms

Isn't the omr-tracker code looking for dropped pings or are you taking an average of the ping replies somehow?

I don't think the interface actually goes offline, it is omr-tracker that brings it offline. A slow ping doesn't mean the interface is down. How can I prevent this? Are the settings here: https://192.168.200.1/cgi-bin/luci/admin/services/omr-tracker

which settings should I try to prevent omr-tracker from restarting the interface, I don't understand what you are looking for to determine if the interface is offline. A ping from 100mn to 700ms isn't that slow it isn't a 5000ms ping or a dropped ping. I don't think a single 700ms ping indicates that the interface has gone offline and needs to be restarted.

Ysurac commented 1 year ago

OMR-Tracker use settings defined in omr-tracker, so when no answer to ping in the time defined the interface is see as down. As you can see in "icmp_seq" there is many missing pings answer (you should not have any missing numbers in the sequence here). If you think interface is still working, only that ping are ignored somewhere (this can happen, ping is low priority on a network), you can try another check method in omr-tracker setting.

ioogithub commented 1 year ago

OMR-Tracker use settings defined in omr-tracker, so when no answer to ping in the time defined the interface is see as down. As you can see in "icmp_seq" there is many missing pings answer (you should not have any missing numbers in the sequence here).

I do see the missing sequence numbers however I still don't understand. If I look at the timestamps:

    [2023-10-24 20:35:04.910532 GMT] 64 bytes from 80.67.169.12: icmp_seq=2403 ttl=52 time=124 ms
    [2023-10-24 20:39:05.111394 GMT] 64 bytes from 80.67.169.12: icmp_seq=2441 ttl=48 time=875 ms
    [2023-10-24 20:39:05.451261 GMT] 64 bytes from 80.67.169.12: icmp_seq=2442 ttl=48 time=214 ms
    [2023-10-24 20:39:06.407972 GMT] 64 bytes from 80.67.169.12: icmp_seq=2443 ttl=48 time=170 ms
    [2023-10-24 20:39:07.364099 GMT] 64 bytes from 80.67.169.12: icmp_seq=2444 ttl=51 time=124 ms
    [2023-10-24 20:39:08.363557 GMT] 64 bytes from 80.67.169.12: icmp_seq=2445 ttl=51 time=122 ms
    [2023-10-24 20:39:09.359364 GMT] 64 bytes from 80.67.169.12: icmp_seq=2446 ttl=51 time=117 ms
    [2023-10-24 20:39:10.367315 GMT] 64 bytes from 80.67.169.12: icmp_seq=2447 ttl=51 time=124 ms

The big gap you identified, seq 2403 - 2441:

 [2023-10-24 20:35:04.910532 GMT] 64 bytes from 80.67.169.12: icmp_seq=2403 ttl=52 time=124 ms
    [2023-10-24 20:39:05.111394 GMT] 64 bytes from 80.67.169.12: icmp_seq=2441 ttl=48 time=875 ms
    [2023-10-24 20:39:05.451261 GMT] 64 bytes from 80.67.169.12: icmp_seq=2442 ttl=48 time=214 ms

but the time stamps of the pings is less than 1 second between those numbers. I have a omr-tracker setting (https://omrip/cgi-bin/luci/admin/services/omr-tracker):

Timeout(s): 2 seconds so there is never a 2 second gap. Is omr-tracker looking at this value or is it looking at some other value?

After this I increased it to 4 second and it still restarts the interface. I never a see a 4 second ping gap in the time stamps, when the timestamps are generated with ping -D

Ysurac commented 1 year ago

OMR-Tracker use ping, so a ping is sent and it wait for a pong as answer. If not the correct pong answer in the time, then test is a failure. The timeout is the waiting time to get the corresponding to the ping. You can try to increase retry interval or the wait after failing test in omr-tracker configuration. Or use another method as httping.

ioogithub commented 1 year ago

Okay, I noticed that this event on the router:

Fri Oct 27 06:16:22 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from eth1 error (8.8.8.8,80.67.169.12,8.8.4.4)

comes after this event on the vps:

Oct 27 06:16:07 vpstest OMR-Service[2394918]: No answer from VPN client end, restart Glorytun-TCP
Oct 27 06:16:07 vpstest systemd[1]: Stopping Glorytun TCP on tun0...
Oct 27 06:16:07 vpstest glorytun-tcp-run[2052661]: INITIALIZED gt-tun0
Oct 27 06:16:07 vpstest glorytun-tcp-run[2052661]: STARTED gt-tun0
Oct 27 06:16:07 vpstest glorytun-tcp-run[2052661]: STOPPED gt-tun0
Oct 27 06:16:07 vpstest systemd-networkd[514]: gt-tun0: Link DOWN
Oct 27 06:16:07 vpstest systemd-networkd[514]: gt-tun0: Lost carrier

and then often glorytun struggles to reconnect. So maybe this is causing the loss of ping. How can I adjust the glorytun timout, this is not omr-tracker but omr-service on the vpn. Is there anything I can to to fix this since there is no interface for this service.

I see a similar issue when I switch to dsvpn and that also has a hard time reconnecting.

vempire-ghost commented 1 year ago

Try change the OMR Tracker Default Settings to None, i think its solves the problem because they only ping the gateway thats is always up if the cable is connected.

ioogithub commented 1 year ago

Try change the OMR Tracker Default Settings to None, i think its solves the problem because they only ping the gateway thats is always up if the cable is connected.

Thanks, I just doubled the retry interval or the wait after failing values as Ysurac suggested but this is actually a good idea to test if there is actually a problem here or not.

When you say set default to None do you mean set it to disable by removing the check mark here:

Defaults Settings
OMR-Tracker create needed routes and detect when a connection is down or up
Enable
When tracker is disabled, connection failover is also disabled 

Have you had this problem?

I set another ping with the exact same command as the omr-tracker is using: ping -B -I eth1 1.1.1.1 and I added the -O option which will show any delayed or missing pings. omr-tracker says it can't ping and cut of the interface 8 times over the past few hours but I never missed a ping and no missing sequence numbers either. I have already had the timeouts setting increased to 5 seconds. I set it to continuously ping and at no time could I not ping 1.1.1.1 for 5 seconds but omr-tracker killed the interface 8 times so I can't recreate what it is doing or what it is detecting.

If I do disable it, then I lose the ability to automatically detect and reactivate it if it actually goes down right? This might not be so bad because this connection failover doesn't work that well. After omr-tracker kills the interface it really struggles to get it back online. Sometimes it will go for 12 hours up and down every few minutes and it will not recover on its own without me manually rebooting the router so maybe i am not missing too much.

vempire-ghost commented 1 year ago

This config image I have see in my log something like this Oct 28 16:00:54 OpenMPTCProuter user.notice post-tracking-001-post-tracking: wan2 (eth2) switched off because check error and ping from 192.168.0.51 error (114.114.115.115,1.2.4.8,80.67.169.40) Oct 28 16:00:54 OpenMPTCProuter user.notice post-tracking-001-post-tracking: Delete default route to xxxxxxx dev eth2 Oct 28 16:02:16 OpenMPTCProuter user.notice post-tracking-001-post-tracking: wan2 (eth2) switched up Oct 28 16:02:16 OpenMPTCProuter user.notice post-tracking-001-post-tracking: Interface route not yet set, set route ip r add default via 192.168.0.1 dev eth2 metric 9

But my connection do not drop, but i do not use VPN, I set VPN to disabled e only use proxy xray to tcp and udp.

ioogithub commented 1 year ago

Oh right, set the ping value to none so omr-tracker will ping the gateway but it won't then ping the website (1.1.1.1) thought the wan interface. Yes those are the logs that plague me.

Interesting to hear about how you turned off the VPN. Why did you decide to not use the VPN? How does omr handle traffic that is not tcp or udp? Are you on the beta1 release with xray?

I am really struggling with a stable VPN, I am using v2ray proxy so it handles tcp and udp and it is stable. The VPN is only used for small things like legacy dns lookup of icmp I guess. It contributes so little to the setup but it destabilizes the whole system constantly. I have tried all the different VPNs but I can't find a stable one, some are broken out of the box with the stable release like openvpn which constantly thows a next hot error, glorytun-tcp and dsvpn really struggle to reconnect with key errors after omr-tracker has killed the connection. I am currently using glorytun-udp because it doesn't maintain a tunnel connection so when omr-tracker kills it is isn't affected as much.

For the omr-tracker, the way I understand it is working is that it first pings the gateway then if there is a problem it pings one of those websites thought the wan interface? But as you said there is never a problem pinging the gateway unless the cable is disconnected so how does it decide there is a problem and then to ping the website? It seems like a good idea but it really don't seem to work well and create more problem than it aims to solve.

Thanks for taking the time to reply. You gave me a great idea I will turn off the ping setting for a period as a test, great suggestion!

vempire-ghost commented 1 year ago

I also tested all the VPNs and couldn't find any that I found stable and good. Most of them make the connection extremely slow or increase the ping. Honestly, I don't know why. That's why I decided to disable the VPN and use only the proxy for TCP/UDP.

From what I understood, OMR always pings the interface, and if it's configured to PING, it will also ping the addresses configured below. But if set to "None," it will only ping the gateway and nothing else. See if this resolves your issue.

Currently, I'm using the latest snapshot with Xray, but I had the same issues with VPN in the release version. Disabling the VPN and using only the proxy has been solving my problems. However, I'm not sure how it handles ICMP or other protocols that should go through the VPN. It might forward them directly.

ioogithub commented 1 year ago

Most of them make the connection extremely slow or increase the ping.

The default glorytun-tcp has been deprecated for a long while now. When I fail over to glorytun I get less speeds of 100Kb which is barely enough to ssh into the server and try to fix it but it can't hold a connection for more than 20 seconds either so it doesn't even really work anyway. Its strange, a lot of people speak very highly about glorytun...

That's why I decided to disable the VPN and use only the proxy for TCP/UDP.

I still don't see what you are doing about non tcp or udp traffic. If I do an iftop -i gt-udp-tun0 -P I still see traffic on this tunnel, mostly dns and icmp I guess but without a VPN maybe it does go out the master interface If so that wouldn't be so bad I guess, especially if you get stability.

Currently, I'm using the latest snapshot I tried beta1 twice for several hours but it was unstable. The VPN and wan1 crashed almost immediately both times and could not reconnect at all, key exchange errors, even after a reboot I could not recover so beta1 is not ready yet.

Thanks for suggesting this, I have been so deep into it I never took a step back to consider this option.

vempire-ghost commented 1 year ago

I really don't know what OMR does when the VPN is disabled with ICMP and DNS traffic. It would be interesting if Ysurac could provide us with this information.

As for the beta or snapshots, I haven't had any serious issues. I use them whenever a new one is released.

ioogithub commented 1 year ago

As for the beta or snapshots, I haven't had any serious issues. I use them whenever a new one is released.

So this make sense because the issues I had with the snapshots were related to the VPN tunnel creating instability.

Last night I increased the values of omr-tracker as Ysurac has mentioned. I increased retry interval or the wait after failing x4. It did not work.

I get this

Sun Oct 29 02:34:02 2023 user.notice OMR-VPS: Can't get vps token, try later (can't ping server vps on vpsip, no server API answer on vpsip)
...
Sun Oct 29 02:34:11 2023 daemon.notice netifd: Network device 'tun0' link is down
...
Sun Oct 29 02:34:41 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and ping from wan1ip error (4.2.2.1,8.8.8.8,80.67.169.12)

After this the post-tracking and MPTCP scripts bring the interface up and down, add and delete routes,. The router was on and offline for until 10am when I manually restarted it. The scripts brought the interface on and offline 18 times.

I also had a ping with -O and -D running and never missed a ping on the wan1 interfae and never missed a sequence number so I really don't know what the script is detecting as down. The problem is that once omr-tracker restarts the interface the scripts can't return the router it to a stable state.

I have just disabled the VPN, I am looking for where those ICMP and DNS lookup traffic went. I will also do some fault testing to see if running without the VPN loses any fault testing or redundancy capabilities.

ioogithub commented 1 year ago

When I disable the VPN on the router I can observe the following change:

I do not see any of this UDP traffic exiting eth1 or eth2 on the router.

So this traffic does shift from exiting on the VPS to exiting on the router interfaces directly.

This result doesn't make a lot of sense to me because I am using v2ray so both TCP and UDP traffic should not be using the tunnel according to this statement on the wizard page (https://routerip/cgi-bin/luci/admin/system/openmptcprouter):

Set the default Proxy used for TCP when ShadowSocks is enabled, for TCP and UDP when V2Ray is enabled. Only ShadowSocks is supported with server multiple IPs for now.

Ysurac why is the UDP traffic still using the VPN tunnel when I am using v2ray as the proxy?

ioogithub commented 1 year ago

The other thing that doesn't make sense to me is this statement:

All VPN available here can do aggregation over MPTCP or using own internal method. OpenVPN can't be used in multi VPS configuration.

So when I do this: sudo ss -tulna -p | grep -e glorytun

I get this a single connection to wan1, now can this VPN do MPTCP if it only keeps 1 connection to a single wan?

tcp ESTAB 0 0 [::ffff:vpsip]:65001 [::ffff:extwan1ip]:36523 users:(("glorytun-tcp",pid=3182108,fd=5))

vempire-ghost commented 1 year ago

When I disable the VPN on the router I can observe the following change:

* With the VPN tunnel active, I can see UDP traffic coming though the tunnel on the VPS with this command: `sudo iftop -i gt-tun0 -P` or `sudo iftop -i gt-tun0`

* * ntp on port 123

* * dns on port 53

I do not see any of this UDP traffic exiting eth1 or eth2 on the router.

* When the VPN is disabled, I see the same traffic exiting eth1 and eth2: `iftop -i eth1 -P -f 'dst port 53'` and `iftop -i eth2 -P -f 'dst port 53'`

So this traffic does shift from exiting on the VPS to exiting on the router interfaces directly.

This result doesn't make a lot of sense to me because I am using v2ray so both TCP and UDP traffic should not be using the tunnel according to this statement on the wizard page (https://routerip/cgi-bin/luci/admin/system/openmptcprouter):

Set the default Proxy used for TCP when ShadowSocks is enabled, for TCP and UDP when V2Ray is enabled. Only ShadowSocks is supported with server multiple IPs for now.

Ysurac why is the UDP traffic still using the VPN tunnel when I am using v2ray as the proxy?

Do you have this settings enabled in advanced settings? image

ioogithub commented 1 year ago

Do you have this settings enabled in advanced settings? image

In the 0.59.1 stable version, on the advanced tab there is a check for this:

When proxy V2Ray is used, use it for UDP

yes it is checked.

vempire-ghost commented 1 year ago

In the snapshot, I confirmed that UDP is being directed to the proxy. However, I'm not sure how to check DNS and ICMP traffic.

Ysurac commented 1 year ago

With V2Ray/XRay, in anycase, local UDP traffic use VPN. Same for ICMP. So when VPN is disabled UDP/ICMP use local WANs (by default in load balancing mode with higher weight for master interface). And as I said for omr-tracker, if it doesn't give you the result you want, try another mode like httping.

ioogithub commented 1 year ago

With V2Ray/XRay, in anycase, local UDP traffic use VPN. Same for ICMP. So when VPN is disabled UDP/ICMP use local WANs (by default in load balancing mode with higher weight for master interface). And as I said for omr-tracker, if it doesn't give you the result you want, try another mode like httping.

Could you explain this a bit? What do you mean by local?

Do you mean traffic originating from the VPN itself? But the traffic is already on the VPN so all it needs is to exit to public IP, it shouldn't use the tunnel at all because this is a connection between OMR and VPN.

So all UDP traffic from clients connected to OMR should use the proxy and not the tun if v2ray is used? The DNS requests I am seeing look like they are coming from clients on the network but I am not sure about this.

So when VPN is disabled UDP/ICMP use local WANs (by default in load balancing mode with higher weight for master interface).

Yes this part I did see from the test I reported above, this was a as vempire-ghost and I thought.

And as I said for omr-tracker, if it doesn't give you the result you want, try another mode like httping.

I can try httping does it get a higher priority than ping also, If we disable the VPN all together, what benefit do we lose? Does it affect redundancy or fail over capabilities? If the VPN doing anything else that is important for omr to function?

Ysurac commented 1 year ago

I mean traffic originating from the router, so NTP used by the router and DNS (when router do request to a domain and when router is used as DNS on client).

When VPN is disabled all traffic that is not TCP and UDP from client (when V2Ray/Xray are used) use WANs directly. All redirections from VPS to router when V2Ray/Xray not used or when checkbox on firewall redirection is not checked will not work. Some part of IPv6 will also not work, same for GRE tunnels. This doesn't affect redudancy, failover or anything else.

vempire-ghost commented 1 year ago

I mean traffic originating from the router, so NTP used by the router and DNS (when router do request to a domain and when router is used as DNS on client).

When VPN is disabled all traffic that is not TCP and UDP from client (when V2Ray/Xray are used) use WANs directly. All redirections from VPS to router when V2Ray/Xray not used or when checkbox on firewall redirection is not checked will not work. Some part of IPv6 will also not work, same for GRE tunnels. This doesn't affect redudancy, failover or anything else.

Is the choice to direct this traffic through the WANs directly your decision, or is it a limitation of the proxy that can't handle it?

ioogithub commented 1 year ago

I mean traffic originating from the router, so NTP used by the router and DNS (when router do request to a domain and when router is used as DNS on client).

When VPN is disabled all traffic that is not TCP and UDP from client (when V2Ray/Xray are used) use WANs directly. All redirections from VPS to router when V2Ray/Xray not used or when checkbox on firewall redirection is not checked will not work. Some part of IPv6 will also not work, same for GRE tunnels. This doesn't affect redudancy, failover or anything else.

Okay so I am using omr-bypass, clients are on a mesh network their dns is the mesh router and that router forwards dns requests to omr. This means that all of these dns requests from the network will still use the tun even with v2ray because it appears as local traffic originating from the router. That explains what I am seeing.

All redirections from VPS to router when V2Ray/Xray not used or when checkbox on firewall redirection is not checked will not work.

Does this mean this works: ACCEPT net $FW tcp 123456 # OMR openmptcprouter open router 123456 port tcp --- V2Ray to ip:123456

but this will not:

ACCEPT net vpn:$OMR_ADDR tcp 123456

Ysurac commented 1 year ago

@vempire-ghost UDP is not applied to local traffic because I didn't add rules for that I think. Maybe another reason that I can't remember too... I will check. @ioogithub Exactly.

ioogithub commented 1 year ago

And as I said for omr-tracker, if it doesn't give you the result you want, try another mode like httping.

I just tried using httping, on https://routerip/cgi-bin/luci/admin/services/omr-tracker under default settings I changed ping to httping and clicked Save and Apply.

It did not work, tun0, wan1, wan2 and v2ray all down. Here are some logs:

Sun Oct 29 14:26:07 2023 daemon.info omr-tracker-v2ray: V2Ray is down (can't contact via http 77.88.55.77, 1.1.1.1, 74.82.42.42, 198.41.212.162)
Sun Oct 29 14:30:46 2023 daemon.err omr-tracker-v2ray[24772]: ping: connect: Network unreachable
Sun Oct 29 14:30:46 2023 daemon.info omr-tracker-v2ray: Server (vpsip) seems down, no answer to ping

Sun Oct 29 14:30:53 2023 daemon.info omr-tracker-v2ray: V2Ray is down (can't contact via http 1.0.0.1, 212.27.48.10, 198.27.92.1, 151.101.129.164, 77.88.55.77, 1.1.1.1, 74.82.42.42, 198.41.212.162, 1.0.0.1, 212.27.48.10, 198.27.92.1, 151.101.129.164)
...
Sun Oct 29 14:32:23 2023 daemon.err glorytun[4108]: read: Operation timed out
Sun Oct 29 14:32:23 2023 daemon.info glorytun[4108]: STOPPED tun0

I let it go for 15 minutes, it brings the interfaces up and down but it keeps killing them. The scripts weren't able to restore the connection. It didn't failover to direct output either. I switched it back to ping and after around 5 minutes it has restored the connections.

Is there something else required to get httping working? I can try again if more config changes are needed to get it working.

ioogithub commented 1 year ago

Ysurac I tried httping again and it does not work. I changed to httping, clicked apply. This result:

All orange X every interface, vps, router:

If I manually restart the wan1 and wan2 interface and try to run your httping command when the routes are in place I get this:

root@OpenMPTCProuter:~# httping 80.67.169.12 -y wan2 -t 4 -c 1
PING 80.67.169.12:80 (/):
short read during receiving reply-headers from host
--- http://80.67.169.12/ ping statistics ---
1 connects, 0 ok, 100.00% failed, time 3625ms

I rebooted the router, waited 30 minutes and it had not brought up any connections. This is the problem. omr-tracker can't get any httping reply and it kills both wans and the tunnel.

Mon Oct 30 17:20:14 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and httping from eth1ip error (4.2.2.1,8.8.8.8,80.67.169.12)
Mon Oct 30 17:20:14 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via eth1 dev eth1
Mon Oct 30 17:20:16 2023 user.notice post-tracking-post-tracking: Restart wan1

Are you sure httping has been tested and worked in the 0.59 stable version? I tried to curl all of the addresses you have in the in omr-tracker under defaults, I don't think any of those are webservers so how can httping work, won't it always fail?

ioogithub commented 1 year ago

Okay I found the problem. The servers on https://192.168.200.1/cgi-bin/luci/admin/services/omr-tracker under the default section do not with with httping. If you jus run the httping command they do not return a successful result so out of the box httping will not work.

One of the servers does work, the cloudflair: 1.1.1.1

I deleted all of the servers except this one and switched to httping. I got 30 minutes up time before:

Mon Oct 30 18:40:38 2023 user.notice post-tracking-post-tracking: wan1 (eth1) switched off because check error and httping from wan1 error (1.1.1.1,1.0.0.1,1.1.1.1)
Mon Oct 30 18:40:38 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via wan1 dev eth1
Mon Oct 30 18:40:39 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and httping from wan2 error (1.1.1.1,1.0.0.1,1.1.1.1)
Mon Oct 30 18:40:39 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via wan2 dev eth2

So now with httping both interfaces go offline at the same time instead of just wan1. I did some manual httping tests on a client on a different network and httping failed to return a successful result 3 times out of maybe 12 tests so I don't think this is a valid test to determine if an interface is up or down. It seems even less reliable than ping.

I think the ultimate solution might be to disable the ping all together. I will try this next and see how long I can maintain a connection.

Ysurac commented 1 year ago

True, my mistake, I forget to said that you must also configure some http servers IP if you use httping. But httping should give you successful result on a working IP. DNS test should work with default IPs. I think you really have issue on your connections...

ioogithub commented 1 year ago

True, my mistake, I forget to said that you must also configure some http servers IP if you use httping

Perhaps you can put a note under the type field here https://192.168.200.1/cgi-bin/luci/admin/services/omr-tracker to alert the user they need to find webservers to use with httping. Or you can change the servers in the ping list to be servers that can return the httping successfully.

But httping should give you successful result on a working IP.

I tried it on a different device on a different network and I saw several failures maybe 20% failure rate. Just trying to httping 1.1.1.1 a few times in a row. If omr-tracker was monitoring this device i would have brought all interface down. Ping seems more reliable.

I think you really have issue on your connections...

Yes that is possible but the actual interface doesn'r seem to go offline until omr-tracker resets it, so how is omr-tracker determining that the interface is offline? I took vempire-ghost advice, and I disabled the ping under default settings. After 18 hours I have not had any interface restarts from omr-tracker. I also have some troubleshooting script running:

I ran these:

nohup /usr/bin/ping -D -O -I wan1ip 1.1.1.1 > noping.log &
nohup /usr/bin/ping -D -O -I wan2ip 1.1.1.1 > noping.log &

when I analyze the logs for wan1 I see dups 800 from 52000:

[1698715032.154684] 64 bytes from 1.1.1.1: icmp_seq=525 ttl=56 time=121 ms
[1698715032.155160] 64 bytes from 1.1.1.1: icmp_seq=525 ttl=56 time=122 ms (DUP!)
[1698715033.068169] 64 bytes from 1.1.1.1: icmp_seq=526 ttl=56 time=34.8 ms

and

[1698715538.475903] 64 bytes from 1.1.1.1: icmp_seq=1031 ttl=56 time=31.9 ms
[1698715539.515271] 64 bytes from 1.1.1.1: icmp_seq=1032 ttl=56 time=70.1 ms
[1698715539.515943] 64 bytes from 1.1.1.1: icmp_seq=1032 ttl=56 time=70.8 ms (DUP!)
[1698715540.482123] 64 bytes from 1.1.1.1: icmp_seq=1033 ttl=56 time=35.1 ms

-O will tell me if there are any missing pings. I saw 21 lost pings out of 52,000:

1698752981.524662] 64 bytes from 1.1.1.1: icmp_seq=38430 ttl=56 time=35.0 ms
[1698752983.563247] no answer yet for icmp_seq=38431
[1698752983.640377] 64 bytes from 1.1.1.1: icmp_seq=38432 ttl=56 time=77.1 ms

for wan2 (starlink) I see 0 dups and but a few more lost pings maybe 25 out of 52,000.

[1698734534.045701] 64 bytes from 1.1.1.1: icmp_seq=19695 ttl=58 time=35.1 ms
[1698734535.052274] 64 bytes from 1.1.1.1: icmp_seq=19696 ttl=58 time=40.4 ms
[1698734537.083340] no answer yet for icmp_seq=19697
[1698734537.112274] 64 bytes from 1.1.1.1: icmp_seq=19698 ttl=58 time=28.9 ms
[1698734538.125624] 64 bytes from 1.1.1.1: icmp_seq=19699 ttl=58 time=40.2 ms

I don't see any signs of hardware issues:

eth1 Link encap:Ethernet HWaddr x.x.x.x inet addr:x.x.x.x Bcast:x.x.x.x Mask:x.x.x.x UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1 RX packets:18973623 errors:0 dropped:0 overruns:0 frame:0 TX packets:16706745 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:10121206740 (9.4 GiB) TX bytes:1920700431 (1.7 GiB)

Ysurac commented 1 year ago

DUP mean duplicate, so you get 2 answers when you send 1 request. This mean you send the request via 2 path or something send the answer 2 times. Always use "-B" command in your ping. OMR-Tracker detect if the interface answer to internet request or not. The interface may not answer for a lot of reason, but in any case it's a problem. If you are happy with gateway only ping, use it. Ping may be blocked or doesn't answer sometimes due to ISP.

ioogithub commented 1 year ago

OMR-Tracker detect if the interface answer to internet request or not. The interface may not answer for a lot of reason, but in any case it's a problem.

Does the logic work like this:

  1. omr-tracker pings the gateway IP though the interface ip.
  2. If it does not answer then confirm by pinging a website (1.1.1.1) though the gateway IP
  3. If it does not answer then restart the interface.

If you are happy with gateway only ping, use it. Ping may be blocked or doesn't answer sometimes due to ISP.

Do I lost anything in terms of redundancy or fail-over if I just ping the interface and not the websites like (1.1.1.1 etc.)? I have had it off for 2 days not and I have not had a single interface restart. I was getting 20 or more restarts a day when it was on.

I would like to understand why it isn't working but I can leave it this way for now as it seems more stable.

Ysurac commented 1 year ago

No, it ping the gateway IP and in any case it ping a website, because a pinging gateway doesn't mean we have internet access (can be IP of a modem or ISP router without Internet access after that) If internet ping doesn't answer then it restart the interface if modem manager is used, else only remove interface from route and disable multipath on it. You loose the case when Internet is not accessible but only IP. As I said, ping (ICMP) is low priority on a network. So if a router have something else to do it can ignore ping. But this doesn't explain httping errors...

vempire-ghost commented 1 year ago

No, it ping the gateway IP and in any case it ping a website, because a pinging gateway doesn't mean we have internet access (can be IP of a modem or ISP router without Internet access after that) If internet ping doesn't answer then it restart the interface if modem manager is used, else only remove interface from route and disable multipath on it. You loose the case when Internet is not accessible but only IP. As I said, ping (ICMP) is low priority on a network. So if a router have something else to do it can ignore ping. But this doesn't explain httping errors...

Is it possible to have different OMR tracker configurations for each WAN? For example, having WAN disabled, WAN2 enabled... this way, I could choose for each WAN whether I want it to test only the gateway or if I want it to test both the gateway and a specific website on a particular WAN. How does OMR define the priority in the routing table for each WAN?

Ysurac commented 1 year ago

Yes, at the end of the page you have this button to add a custom setting for an interface: image

Master is always highest priority, other interfaces have same priority for routing (to VPS or when VPN is disabled).

vempire-ghost commented 1 year ago

Yes, at the end of the page you have this button to add a custom setting for an interface: image

Master is always highest priority, other interfaces have same priority for routing (to VPS or when VPN is disabled).

Thanks, If I configure a custom rule for WAN1, will it use this rule for that WAN, and the others will continue to use the rule defined in the default configuration?

github-actions[bot] commented 9 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days