Ysurac / openmptcprouter

OpenMPTCProuter is an open source solution to aggregate multiple internet connections using Multipath TCP (MPTCP) on OpenWrt
https://www.openmptcprouter.com/
GNU General Public License v3.0
1.82k stars 259 forks source link

OpenMTCPRouter totally unstable #1561

Closed HSE84 closed 3 years ago

HSE84 commented 3 years ago

Expected Behavior

Mptcprouter statut should stay always green

Current Behavior

Status is green when VPS and OMR are started, but after a few minutes it becomes totally unstable. Connection falls every 2 minutes, all day long. I thought that was a clock zone problem then I realigned all, but this doesnt work.

Please find attached system logs and VPS logs : System.log Kernel.log VPS-daemon.log

Specifications

Setup : 3x 4G Huawei modems, RPI4b, OVH VPS Modems : good connection (external antenna)

Ysurac commented 3 years ago

I don't download file attached to an issue, this is a nightmare to follow an issue with that. You can use a pastebin, or paste info on the issue. What do you have on OpenMPTCProuter status page ?

HSE84 commented 3 years ago

Thank you for your answer :-)

Statut : https://cln.sh/qDbt8t Sorry I do not have other screen tonight because it seems to be stable since a 1 hour. Will post new screens when I got next problem.

System.log

Mon Feb 1 18:00:39 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected Mon Feb 1 18:00:47 2021 user.notice OMR-VPS: Can't get vps token, try later (can't ping server vps, server API answer) Mon Feb 1 18:01:01 2021 daemon.info omr-tracker-ss: Shadowsocks is down (can't contact via http 1.1.1.1, 74.82.42.42, 176.103.130.130) Mon Feb 1 18:01:01 2021 user.notice Shadowsocks: Rules DOWN Mon Feb 1 18:01:04 2021 daemon.info omr-tracker-ss: Server (51.178.xx.xx) seems down, no answer to ping Mon Feb 1 18:01:09 2021 user.notice OMR-VPS: Can't get vps token, try later Mon Feb 1 18:01:24 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected Mon Feb 1 18:01:25 2021 daemon.info omr-tracker-ss: Shadowsocks is up (can contact via http 212.27.48.10) Mon Feb 1 18:01:25 2021 daemon.info omr-tracker-ss: Reload Shadowsocks rules Mon Feb 1 18:01:25 2021 user.notice Shadowsocks: Rules UP Mon Feb 1 18:01:25 2021 user.notice Shadowsocks: Reload omr-bypass rules Mon Feb 1 18:01:25 2021 user.notice omr-bypass: Starting OMR-ByPass... Mon Feb 1 18:01:26 2021 user.notice omr-bypass: Reload dnsmasq... Mon Feb 1 18:01:27 2021 daemon.info dnsmasq[18884]: read /etc/hosts - 4 addresses Mon Feb 1 18:01:27 2021 daemon.info dnsmasq[18884]: read /tmp/hosts/dhcp.cfg01411c - 1 addresses Mon Feb 1 18:01:27 2021 daemon.info dnsmasq-dhcp[18884]: read /etc/ethers - 0 addresses Mon Feb 1 18:01:27 2021 user.notice omr-bypass: OMR-ByPass is running Mon Feb 1 18:01:51 2021 user.notice OMR-VPS: Can't get vps token, try later (can't ping server vps, no server API answer) Mon Feb 1 18:01:54 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected

VPS Log :

Feb 1 18:16:41 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:16:42 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:16:44 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:17:17 vps-c1f368ae ss-server[600]: getpeername: Transport endpoint is not connected Feb 1 18:17:23 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:18:18 vps-c1f368ae ss-server[600]: getpeername: Transport endpoint is not connected Feb 1 18:18:19 vps-c1f368ae ss-server[600]: getpeername: Transport endpoint is not connected Feb 1 18:18:57 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:19:09 vps-c1f368ae ss-server[600]: getpeername: Transport endpoint is not connected Feb 1 18:19:11 vps-c1f368ae ss-server[600]: getpeername: Transport endpoint is not connected Feb 1 18:19:19 vps-c1f368ae systemd[1]: Started Session 16 of user debian. Feb 1 18:19:27 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:20:00 vps-c1f368ae ss-server[600]: getpeername: Transport endpoint is not connected Feb 1 18:20:17 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:20:17 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:20:17 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer Feb 1 18:20:17 vps-c1f368ae ss-server[600]: remote recv: Connection reset by peer

Ysurac commented 3 years ago

Orange and Free are on same network, this can be a problem in some case.

HSE84 commented 3 years ago

Thank you very much for your support Ysurac :-)

I agree, Orange and Free are sharing the same assets, but I don't think the problem comes from this origin because the lack of ping rarely occur simultaneously on all modems.

Another track of analysis : the problem seems to occur more often when many users are working on the network.

Any other comment on my former logs ? What would you advise, reinstall all ?

Here is another sample : Tue Feb 2 13:20:54 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected Tue Feb 2 13:21:10 2021 user.notice OMR-VPS: Can't get vps token, try later Tue Feb 2 13:21:16 2021 daemon.info omr-tracker-ss: Shadowsocks is down (can't contact via http 74.82.42.42, 176.103.130.130, 1.0.0.1) Tue Feb 2 13:21:16 2021 user.notice Shadowsocks: Rules DOWN Tue Feb 2 13:21:18 2021 daemon.info omr-tracker-ss: Server (51.178.xx.xx) seems down, no answer to ping Tue Feb 2 13:21:22 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected Tue Feb 2 13:21:24 2021 daemon.info omr-tracker-ss: Shadowsocks is up (can contact via http 212.27.48.10) Tue Feb 2 13:21:24 2021 daemon.info omr-tracker-ss: Reload Shadowsocks rules Tue Feb 2 13:21:24 2021 user.notice Shadowsocks: Rules UP Tue Feb 2 13:21:24 2021 user.notice Shadowsocks: Reload omr-bypass rules Tue Feb 2 13:21:24 2021 user.notice omr-bypass: Starting OMR-ByPass... Tue Feb 2 13:21:25 2021 user.notice omr-bypass: Reload dnsmasq... Tue Feb 2 13:21:26 2021 daemon.info dnsmasq[18884]: read /etc/hosts - 4 addresses Tue Feb 2 13:21:26 2021 daemon.info dnsmasq[18884]: read /tmp/hosts/dhcp.cfg01411c - 1 addresses Tue Feb 2 13:21:26 2021 daemon.info dnsmasq-dhcp[18884]: read /etc/ethers - 0 addresses Tue Feb 2 13:21:26 2021 user.notice omr-bypass: OMR-ByPass is running Tue Feb 2 13:45:46 2021 daemon.err /usr/bin/ss-redir[17011]: server recv: Operation timed out

HSE84 commented 3 years ago
CleanShot 2021-02-02 at 14 52 45@2x
ghost commented 3 years ago

Tue Feb 2 14:21:26 2021 user.notice post-tracking-post-tracking: wan2 (wan2) switched off because ping from 192.168.2.3 error (4.2.2.1,8.8.8.8,80.67.169.12) Tue Feb 2 14:21:26 2021 user.notice post-tracking-post-tracking: Delete default route via 192.168.2.1 dev wan2 Tue Feb 2 14:21:30 2021 user.notice post-tracking-post-tracking: ovpnwan2 (tun4) switched off because gateway down and ping from 10.255.250.6 error (1.2.4.8,80.67.169.40,114.114.114.114) Tue Feb 2 14:21:30 2021 daemon.err openvpn(wan2)[8346]: event_wait : Interrupted system call (code=4) Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[8346]: net_addr_ptp_v4_del: 10.255.250.6 dev tun4 Tue Feb 2 14:21:30 2021 daemon.notice netifd: Network device 'tun4' link is down Tue Feb 2 14:21:30 2021 daemon.notice netifd: Interface 'ovpnwan2' has link connectivity loss Tue Feb 2 14:21:30 2021 daemon.notice netifd: Interface 'ovpnwan2' is now down Tue Feb 2 14:21:30 2021 daemon.debug dnsmasq[27962]: stopped listening on tun4(#55): 10.255.250.6 port 53 Tue Feb 2 14:21:30 2021 user.notice ModemManager: hotplug: remove network interface tun4: event processed Tue Feb 2 14:21:30 2021 user.notice ModemManager: hotplug: event reported: action=remove, name=tun4, subsystem=net Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[8346]: /usr/libexec/openvpn-hotplug down wan2 tun4 1500 1624 10.255.250.6 10.255.250.5 init Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[8346]: SIGTERM[hard,] received, process exiting Tue Feb 2 14:21:30 2021 daemon.notice netifd: Interface 'ovpnwan2' is disabled Tue Feb 2 14:21:30 2021 daemon.warn openvpn(wan2)[23375]: DEPRECATED OPTION: ncp-disable. Disabling cipher negotiation is a deprecated debug feature that will be removed in OpenVPN 2.6 Tue Feb 2 14:21:30 2021 daemon.warn openvpn(wan2)[23375]: DEPRECATED OPTION: --cipher set to 'AES-256-CBC' but missing in --data-ciphers (AES-256-GCM:AES-128-GCM). Future OpenVPN version will ignore --cipher for cipher negotiations. Add 'AES-256-CBC' to --data-ciphers or change --cipher 'AES-256-CBC' to --data-ciphers-fallback 'AES-256-CBC' to silence this warning. Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[23375]: OpenVPN 2.5.0 x86_64-openwrt-linux-gnu [SSL (OpenSSL)] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[23375]: library versions: OpenSSL 1.1.1i 8 Dec 2020 Tue Feb 2 14:21:30 2021 daemon.warn openvpn(wan2)[23375]: WARNING: No server certificate verification method has been enabled. See http://openvpn.net/howto.html#mitm for more info. Tue Feb 2 14:21:30 2021 daemon.warn openvpn(wan2)[23375]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[23375]: TCP/UDP: Preserving recently used remote address: [AF_INET]10x.xx.xxx.xxx:65301 Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[23375]: UDP link local (bound): [AF_INET]192.168.2.3:65301 Tue Feb 2 14:21:30 2021 daemon.notice openvpn(wan2)[23375]: UDP link remote: [AF_INET]51.38.184.251:65301 Tue Feb 2 14:21:30 2021 user.notice post-tracking-post-tracking: Delete default route via 10.255.250.5 dev tun4 Tue Feb 2 14:21:30 2021 kern.err kernel: [318863.746279] mptcp_init4_subsockets: token 0xce6ad8da bind() to 10.255.250.6 index 55 failed, error -99 Tue Feb 2 14:21:30 2021 kern.err kernel: [318863.763255] mptcp_init4_subsockets: token 0xc7b919ac bind() to 10.255.250.6 index 55 failed, error -99 Tue Feb 2 14:21:33 2021 user.notice post-tracking-post-tracking: Tunnel up : Replace default route by 10.255.255.1 dev tun0 Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18057]: send: Operation timed out Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18055]: send: Operation timed out Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18056]: send: Operation timed out Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18059]: send: Operation timed out Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18059]: send: Operation timed out Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18057]: send: Operation timed out Tue Feb 2 14:21:35 2021 daemon.err /usr/bin/ss-redir[18056]: send: Operation timed out Tue Feb 2 14:21:49 2021 daemon.err uhttpd[6251]: luci: accepted login on / for root from 192.168.100.180

I'm getting the same issue when one of the connection goes down, shadowsocks break and there seems to MPTCP error? Couldn't ask more though, a great project.

Ysurac commented 3 years ago

You are using MPTCP over VPN. Why ?

ghost commented 3 years ago

You are using MPTCP over VPN. Why ?

My local DSL ISP filters MPTCP unfortunately. Thank you for the response!

Ysurac commented 3 years ago

Orange and Free doesn't filters MPTCP.

ghost commented 3 years ago

Orange and Free doesn't filters MPTCP.

I think you may have mistaken me for the author for this ticket, I'm not in France, my ISP is Etisalat. I'm testing "persist-tun" on your "MPTCP over VPN" implementation, you think it may work? Edit: Unfortunately it did not.

Ysurac commented 3 years ago

Hum yes. I didn't check... I don't think it's same issue as author. You have MPTCP over VPN enabled on all WANs ? In this case it's not supported/tested. And may be better to use only glorytun-UDP or MLVPN.

ghost commented 3 years ago

Would using persist-tun option work?

Hum yes. I didn't check... I don't think it's same issue as author. You have MPTCP over VPN enabled on all WANs ? In this case it's not supported/tested. And may be better to use only glorytun-UDP or MLVPN.

Thanks, no, only on one WAN. mlvpn and glorytun performance is very bad. The issue is only prevalent with "MPTCP over VPN" even if it is applied on one of my 3 WANs, the whole proxy goes down for 2 seconds if ovpn udp is down, I'll open a new issue with enough info and logs then.

HSE84 commented 3 years ago

Thank you very much for your support Ysurac :-)

I agree, Orange and Free are sharing the same assets, but I don't think the problem comes from this origin because the lack of ping rarely occur simultaneously on all modems.

Another track of analysis : the problem seems to occur more often when many users are working on the network.

Any other comment on my former logs ? What would you advise, reinstall all ?

Here is another sample : Tue Feb 2 13:20:54 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected Tue Feb 2 13:21:10 2021 user.notice OMR-VPS: Can't get vps token, try later Tue Feb 2 13:21:16 2021 daemon.info omr-tracker-ss: Shadowsocks is down (can't contact via http 74.82.42.42, 176.103.130.130, 1.0.0.1) Tue Feb 2 13:21:16 2021 user.notice Shadowsocks: Rules DOWN Tue Feb 2 13:21:18 2021 daemon.info omr-tracker-ss: Server (51.178.xx.xx) seems down, no answer to ping Tue Feb 2 13:21:22 2021 daemon.err /usr/bin/ss-local[17007]: getpeername: Socket not connected Tue Feb 2 13:21:24 2021 daemon.info omr-tracker-ss: Shadowsocks is up (can contact via http 212.27.48.10) Tue Feb 2 13:21:24 2021 daemon.info omr-tracker-ss: Reload Shadowsocks rules Tue Feb 2 13:21:24 2021 user.notice Shadowsocks: Rules UP Tue Feb 2 13:21:24 2021 user.notice Shadowsocks: Reload omr-bypass rules Tue Feb 2 13:21:24 2021 user.notice omr-bypass: Starting OMR-ByPass... Tue Feb 2 13:21:25 2021 user.notice omr-bypass: Reload dnsmasq... Tue Feb 2 13:21:26 2021 daemon.info dnsmasq[18884]: read /etc/hosts - 4 addresses Tue Feb 2 13:21:26 2021 daemon.info dnsmasq[18884]: read /tmp/hosts/dhcp.cfg01411c - 1 addresses Tue Feb 2 13:21:26 2021 daemon.info dnsmasq-dhcp[18884]: read /etc/ethers - 0 addresses Tue Feb 2 13:21:26 2021 user.notice omr-bypass: OMR-ByPass is running Tue Feb 2 13:45:46 2021 daemon.err /usr/bin/ss-redir[17011]: server recv: Operation timed out

@Ysurac Any response to my questions below ? Btw, do you provide professionnal service on the project ? If yes I would be interested because I really need to make it work.

Ysurac commented 3 years ago

If it's related to high usage, you can try to enable SQM in the wizard.

HSE84 commented 3 years ago

OK, thanks, I'll try and will keep you updated about the results tomorrow with more users.

Adorfer commented 3 years ago

my guess: A lot of these "OMR does not work stable/good" issues need a FAQ which explains, which Settings (VPN types, SQM) in which scenarios. and where to look for optimizations.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days