Closed ioogithub closed 6 months ago
I have more information. I can reproduce the bug, not with omr-tracker but by stopping the interface (wan2) and starting again. After I start it, OMR never uses it for MPTCP so it becomes single path router. Here is the log wan2 started again:
Thu Aug 24 19:54:50 2023 kern.info kernel: [23103.714924] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Thu Aug 24 19:54:56 2023 daemon.notice netifd: wan2 (30320): udhcpc: sending select for x.x.x.x
Thu Aug 24 19:54:56 2023 daemon.notice netifd: wan2 (30320): udhcpc: lease of x.x.x.x obtained, lease time 172800
Thu Aug 24 19:54:56 2023 daemon.notice netifd: Interface 'wan2' is now up
Thu Aug 24 19:54:56 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Aug 24 19:54:56 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Aug 24 19:54:57 2023 user.notice MPTCP: Flush route cache
Thu Aug 24 19:54:57 2023 user.notice firewall: Reloading firewall due to ifup of wan2 (eth2)
Thu Aug 24 19:54:57 2023 user.notice firewall.omr-server: Firewall reload, set server part firewall reloading
Thu Aug 24 19:54:57 2023 user.notice mptcp: Reloading mptcp config due to ifup of wan2 (eth2)
Thu Aug 24 19:54:59 2023 user.notice post-tracking-post-tracking: Set firewall on server vps
Thu Aug 24 19:55:00 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Aug 24 19:55:00 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Aug 24 19:55:00 2023 kern.info kernel: [23113.674687] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Thu Aug 24 19:55:02 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched up
Thu Aug 24 19:55:46 2023 user.notice post-tracking-post-tracking: Check API configuration...
Thu Aug 24 19:55:52 2023 user.notice post-tracking-post-tracking: Check API configuration... Done
So I guess Thu Aug 24 19:54:57 2023 user.notice mptcp: Reloading mptcp config due to ifup of wan2 (eth2)
is the problem.
I checked ip r
before restarting the interface and after bringing it back up. All routes the the same but how can I tell MPTCP to use both wan1 and wan2 after wan2 comes back online?
try to reload v2ray? only some VPN use MPTCP.
try to reload v2ray? only some VPN use MPTCP. I tried to reload with
/etc/init.d/openmptcprouter-vps restart
but it executes and exits and doesn't restart anything that I can see in the logs.
If I reload with /etc/init.d/v2ray reload
it doesn't seem to fix the problem, traffic still comes in over only 1 wan and I see these errors:
[info] DNS disabled: main_dns
[info] Setting transparent proxy on port: 1234
[info] Transparent proxy mode: default
Flush terminated
ss-rules6: unknown option def
iptables-restore: line 2 failed
iptables-restore: line 2 failed
iptables-restore: line 2 failed
iptables-restore: line 2 failed
iptables-restore: line 2 failed
iptables-restore: line 2 failed
iptables-restore: line 2 failed
iptables-restore: line 2 failed
Warning: Section 'zone_vpn' cannot resolve device of network 'omr6in4'
Warning: Option @redirect[3].v2ray is unknown
Warning: Option @redirect[4].v2ray is unknown
Warning: Section @redirect[4] (19224 on v2ray) does not specify a protocol, assuming TCP+UDP
If I restart with /etc/init.d/v2ray restart
then it does work but it creates a new problem, all clients on the network get disconnected so this is too disruptive as omr-tracker can restart an interface at any time.
MPTCP is working perfectly until an interface is restarted or omr-tracker tracker bring an interface down and up. After that this interface cannot be used for multi path.
There has to be a way to tell v2ray that the multi path is available to be used again. When I bring the interface down, OMR knows how to fail over to one interface successfully, it should be able to recover when the interface comes back up.
Do you have same issue using Shadowsocks ? And do you have same issue using 6.1 snapshot image ?
Do you have same issue using Shadowsocks ? I just did 3 tests with v2ray and 3 tests with shadowsocks. The problem affects both v2ray and shadowsocks proxies.
Test:
I can repeat the test by restating wan1 and same result. After interface restarts the proxy does not use it for multi-path.
Here is the result:
Normal multi-path:
After wan2 interface is reset, multi-path is broken:
I am looking at the logs after restarting an interface trying to find out what is not running. I see one log entry:
Fri Aug 25 13:10:49 2023 daemon.notice netifd: wan2 (23088): Command failed: Permission denied
but it doesn't say what command failed.
Please help me to troubleshoot:
I think this is a serious problem because omr-tracker can restart interface at any time and that disables multipath on OMR router, user will not realize that OMR is running in degraded performance.
If OMR can successfully degrade the router to single path it should be able to upgrade it back to multi-path as well right?
It's not a know problem and I don't have this issue on my 0.59.1 (and still not with latest snapshot).
What is the result of multipath
, ip a
and ip r
commands ?
Did you test it with an upload so it would use the port forward and the proxy, either shadowsocks or v2ray same result for me?
I think I saw download is working once but I am not sure because testing uploads now.
I am looking at ip a and ip r and ip rule and multipath. I will compare results of working fresh reboot of OMR and results after interface is restarted and incoming transfers not using multi-oath.
I did lots of tests:
The routes, rules, interfaces and multipath settings are all identical when its working and when it is broken after restarting an interface.
It's not a know problem and I don't have this issue on my 0.59.1 (and still not with latest snapshot). What is the result of
multipath
,ip a
andip r
commands ?
Here are the outputs, everything looks normal there is nothing to me:
ip r
default via 10.255.255.1 dev tun0
default metric 1
nexthop via 10.6.0.1 dev eth1 weight 2
nexthop via 10.0.0.1 dev eth2 weight 1
default via 10.6.0.1 dev eth1 metric 3
default via 10.0.0.1 dev eth2 metric 4
default via 10.255.255.1 dev tun0 metric 1200
10.0.0.0/24 via 10.0.0.1 dev eth2
10.0.0.0/24 dev eth2 scope link metric 4
10.255.255.1 dev tun0 proto kernel scope link src 10.255.255.2
10.255.255.2 dev tun0 scope link metric 1200
6.1.2.2 dev eth1 proto static scope link src 10.6.8.1 metric 3
10.6.0.0/10 dev eth1 scope link metric 3
127.0.0.0/8 dev lo proto static scope link metric 6
x.x.x.x metric 1
nexthop via 10.6.0.1 dev eth1 weight 10
nexthop via 10.0.0.1 dev eth2 weight 1
multipath:
bond0 is deactivated
erspan0 is deactivated
eth0 is deactivated
eth1 is in default mode
eth2 is in default mode
eth3 is deactivated
gre0 is deactivated
gretap0 is deactivated
ifb4tun0 is deactivated
ip6gre0 is deactivated
ip6tnl0 is deactivated
lo is deactivated
sit0 is deactivated
teql0 is deactivated
tun0 is deactivated
ip a:
1: lo: <LOOPBACK,UP,LOWER_UP,80000> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ip6tnl0@NONE: <NOARP,80000> mtu 1452 qdisc noop state DOWN group default qlen 1000
link/tunnel6 :: brd ::
3: sit0@NONE: <NOARP,80000> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
4: gre0@NONE: <NOARP,80000> mtu 1476 qdisc noop state DOWN group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
5: gretap0@NONE: <BROADCAST,MULTICAST,80000> mtu 1462 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
6: erspan0@NONE: <BROADCAST,MULTICAST,80000> mtu 1450 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: ip6gre0@NONE: <NOARP,80000> mtu 1448 qdisc noop state DOWN group default qlen 1000
link/gre6 :: brd ::
8: bond0: <BROADCAST,MULTICAST,MASTER,80000> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b1:b1:b1:b1:b1:b1 brd ff:ff:ff:ff:ff:ff
9: teql0: <NOARP,80000> mtu 1500 qdisc noop state DOWN group default qlen 100
link/void
10: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP,80000> mtu 1500 qdisc mq state UP group default qlen 100
link/ether a:aa:aa:aa:aa:aa brd ff:ff:ff:ff:ff:ff
inet 192.168.x.x/24 brd 192.168.200.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 bbbb:bbbb:bbbb::1/60 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 cccc:cccc:cccc:cccc:cccc/64 scope link
valid_lft forever preferred_lft forever
11: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 100
link/ether dd:dd:dd:dd:dd:dd brd ff:ff:ff:ff:ff:ff
inet 10.6.8.1/10 brd 100.127.255.255 scope global eth1
valid_lft forever preferred_lft forever
12: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 100
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/24 brd 10.0.0.255 scope global eth2
valid_lft forever preferred_lft forever
13: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP,80000> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a1:a1:a1:a1:a1:a1 brd ff:ff:ff:ff:ff:ff
inet 192.168.a.b/24 brd 192.168.200.255 scope global eth3
valid_lft forever preferred_lft forever
21: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP,80000> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 100
link/none
inet 10.255.255.2 peer 10.255.255.1/32 scope global tun0
valid_lft forever preferred_lft forever
inet6 a2a2:a2a2:a2a2:a2a2:a2a2/64 scope link stable-privacy
valid_lft forever preferred_lft forever
24: ifb4tun0: <BROADCAST,NOARP,80000> mtu 1500 qdisc noop state DOWN group default qlen 32
link/ether a3:a3:a3:a3:a3:a3 brd ff:ff:ff:ff:ff:ff
Do you have any ideas of what else I can look at, its like the proxy get stuck on a single path or it doesn't get notified that multi-path is available again.
Questions:
If ip r, ip rule etc. are the same with it is working and not working it must be something else, what else should I look for?
Only mptcp init script is used, to set route table and multipath command to set multipath status of an interface. I don't understand exactly the issue, it's only with port forwarding ? In this case with shadowsocks as Proxy only VPN is used and with v2ray it's used if you have the checkbox enabled in firewall configuration. It's aggregated in download and not on port forwarding upload in both case ?
What is the MPTCP scheduler used ?
mptcp init script is used, to set route table
this script sets route table: /etc/init.d/mptcp
multipath command to set multipath status of an interface
Is this command inside the mptcp init script?
But if the route table and multipath status is the same before and after the interface is restarted then the problem is somewhere else.
Does this mean that only the routes and multipath status of interface determines if the proxy will use multipath? So the proxy pushes data out and the kernel uses the routing table to route over multiple interfaces? Does this mean the proxy doesn't know about MPTCP? There are no other settings that notify the proxy?
I don't understand exactly the issue, it's only with port forwarding ?
I think it is only uploads, when a client from outside the network requests data from a computer inside the network after an interface has been restarted, it does not use that interface. It sends data only on the other interface. In this case port forwarding is used. I don't know if port forwarding is involved but downloads do not use that port forwarding.
In this case with shadowsocks as Proxy only VPN is used
For this I made used used different port forwarding rules which swtching to shadowsocks.
with v2ray it's used if you have the checkbox enabled in firewall configuration.
Yes I am using the v2ray checkbox port forwarding rule.
All of this works normally before an interface is stopped and started or restarted. After an interface is restarted there is no way to get OMR to use that interface for uploads.
I see the problem with both v2ray and shadowsocks:
It seems like the proxy gets stuck, it doesn't get notified that both wan1 and wan2 are available.
I will start a fresh test:
What is the MPTCP scheduler used ?
scheduler: BLEST congestion control: reno path manager: full mesh TCP sync: 2 MPTCP version: 0 MPTCP checksum: disabled
There is an MPTCP debug if I set it will we get more data in the logs?
If you have any more ideas please let me know.
I have done a lot of testing today, here is the result:
I initially thought the problem was uploads for shadowsocks and v2ray but I was wrong. When shadowsocks, sometimes it starts working only after a delay of up to 2 minutes. Sometimes it starts right away. In my previous tests I did not wait long enough. I did see it working today.
So the problem is isolated to a specific use case: v2ray uploads (traffic initiated from outside the network using the v2ray port forwarding) only. I have never seen this work.
I have also tried the following:
Nothing I can do will restore aggregate multi-path for uploads with v2ray once an interface is restarted. Traffic always flows though the one interface that was not restarted.
The only way to recover from this is to click Save and Apply on the Wizard screen or reboot OMR.
Question: how is the MPTCP actually working on OMR? The v2ray config.json doesn't have MPTCP set in the stream settings, it is just TCP so I guess v2ray doesn't even know about MPTCP so how does it even work?
It's not a know problem and I don't have this issue on my 0.59.1 (and still not with latest snapshot). What is the result of
multipath
,ip a
andip r
commands ?
You tested tested uploads with port forwarding using v2ray as the proxy with 0.59.1 and it is working for you correct?
If this works for you and it does not work for me that suggests a configuration problem. Tomorrow I will setup a VPS with the 6.1 upstream kernel and boot my router with openmptcprouter-v0.59.2alpha-6.1-r0+23789-ce6ad123e7-x86-64-generic-ext4-combined-efi, configure v2ray and port forwarding and repeat the tests above. I will report the results back here.
Here is the result of testing with openmptcprouter v0.59.2alpha-6.1 r0+23789-ce6ad123e7.
wget -O - http://www.openmptcprouter.com/server-test/debian-x86_64.sh | UPSTREAM6="yes" sh
openmptcprouter-v0.59.2alpha-6.1-r0+23789-ce6ad123e7-x86-64-generic-ext4-combined-efi.img.gz
Configure openwrt:
shadowsocks upload: multipath working shadowsocks upload after interface reset: multipath working
Configure V2ray:
Start an upload using v2ray: multi-path working
Reset wan1 interface: multi-path not working
Same results as the latest stable release. Uploads do not use aggregate multi-path with v2ray as the proxy after an interface is restarted.
There is currently no way to recover from this.
With stable, at lease Save and Apply or reboot OMR will fix the problem. In 6.1 this does not work. So the this issue at this time it is better to stay with stable until a work around is found.
This is an example of an event that kills aggregate uploads:
Aug 28 03:33:19 OpenMPTCProuter user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and ping from x.x.x.x error (1.0.0.1,114.114.115.115,1.2.4.8)
Aug 28 03:33:19 OpenMPTCProuter user.notice post-tracking-post-tracking: Delete default route to y.y.y.y via x.x.x.x dev eth2
Aug 28 03:41:15 OpenMPTCProuter daemon.notice netifd: Network device 'eth2' link is down
Aug 28 03:41:15 OpenMPTCProuter daemon.notice netifd: Interface 'wan2' has link connectivity loss
Aug 28 03:41:22 OpenMPTCProuter daemon.notice netifd: Network device 'eth2' link is up
Aug 28 03:41:22 OpenMPTCProuter daemon.notice netifd: Interface 'wan2' has link connectivity
Aug 28 03:41:25 OpenMPTCProuter user.notice post-tracking-post-tracking: wan2 (eth2) switched up
Aug 28 03:41:25 OpenMPTCProuter user.notice post-tracking-post-tracking: Interface route not yet set, set route ip r add default via x.x.x.x dev eth2 metric 4
Aug 28 03:41:26 OpenMPTCProuter user.notice post-tracking-post-tracking: New public ip detected for wan2 (eth2): a.a.a.a
Aug 28 03:41:26 OpenMPTCProuter user.notice post-tracking-post-tracking: Reload MPTCP for eth2
The OMR-tracker brings the interface down and up again. After this, OMR is in single mode and will not use wan2 again until the server is restarted so this issue has a big impact on OMR stability and reliability.
There has to be a way to have uploads recover without manual user intervention after these interface reset events.
On latest snapshot, I'm not able to reproduce the issue for now using "iperf3" to test upload speed with V2Ray VLESS. How do you test upload speed ? By downloading externally using port forward ?
Yes, I am doing a real world test like this:
external client -> VPS (public IP) -> port forward OMR via v2ray vless -> host on LAN.
What exact test are you running? omr-iperf
on OMR? If I do that right now this is what I get but I only see one wan connection use so maybe this test is not correct test:
https://ibb.co/L5W55Tg
For my test, after OMR is restarted, If I do a download from an external client (upload) in a good state, I get this: https://ibb.co/LvLDFn8
If I do a download from an external client (upload) after an interface was restarted (by omr-tracker) or manually, I get this: https://ibb.co/tDfRGD6
It never recovers from this. omr-tracker reset wan2 8 hours ago:
Aug 28 07:43:41 OpenMPTCProuter user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and ping from x.x.x.x error (1.0.0.1,114.114.115.115,1.2.4.8), so OMR has been sitting in single path mode for uploads since this time.
If I do this: /etc/init/d/v2ray restart
or wizard validate on the stable version, it will return to normal aggregate however this operation is disruptive, it kills all client connections and is manual and it only lasts until omr-tracker resets one of the interfaces again.
here is a new upload test as discussed using iperf and a new port foward setup for the test.
result, bandwidth never return to wan2: https://ibb.co/x8tJqQf start a new file transfer: https://ibb.co/PY1vNsz you can see, on the first screenshot, when wan2 goes down, omr switches to wan1 but after wan2 comes back up it never recovers.
The problem is with stable /etc/init.d/v2ray restart
would fix the problem but disconnect all clients on the network. With 6.1 testing, after running /etc/init.d/v2ray restart
packets do not leave the vps and get to omr. The only way to recover and start a new test is to restart both OMR and VPS. Validating wizard and v2ray restart, packets do not get past vps with tcpdump -i eth0 port 19223
I've tested, using V2RAY VLESS protocol redirect port to internal http server and downloading file from external and no problem, when I remove a connection and put it back it's used again. I have also no problem with v2ray restart.
Can you give me full log when you test disconnect and then reconnect a connection ? I will reset my install to a fresh config for more tests.
Can you give me full log when you test disconnect and then reconnect a connection ?
Yes, please let me know exactly which log you are looking for ? logread from OMR, journalctl -f from vps or something else? Is there any debug mode I can enable to get more info?
I would need system log from the router, so the result of logread
.
I tested with a new VPS and router installation and I have no issue. What is the result of uci show network
via SSH on router ?
I have been testing with the 6.1 kernel and switched back to 0.51.9 recently as it is easier for me to recover.
What is the result of
uci show network
network.loopback=interface network.loopback.proto='static' network.loopback.ipaddr='127.0.0.1' network.loopback.netmask='255.0.0.0' network.loopback.multipath='off' network.loopback.device='lo' network.loopback.metric='6' network.globals=globals network.globals.mptcp_subflows='3' network.globals.mptcp_add_addr_accepted='1' network.globals.mptcp_add_addr_timeout='120' network.globals.mptcp_fullmesh_num_subflows='1' network.globals.mptcp_fullmesh_create_on_err='1' network.globals.mptcp_ndiffports_num_subflows='1' network.globals.ula_prefix='abac:4c32:5cbb::/48' network.globals.mptcp_version='0' network.globals.mptcp_rr_cwnd_limited='Y' network.globals.mptcp_rr_num_segments='1' network.globals.multipath='enable' network.globals.mptcp_syn_retries='2' network.globals.mptcp_scheduler='blest' network.globals.congestion='reno' network.globals.mptcp_path_manager='fullmesh' network.globals.mptcp_checksum='0' network.globals.mptcp_debug='0' network.lan=interface network.lan.proto='static' network.lan.ipaddr='192.168.x.x' network.lan.netmask='255.255.255.0' network.lan.device='eth0' network.lan.ifname='eth0' network.lan.delegate='0' network.lan.addlatency='0' network.lan.multipath='off' network.lan.ip4table='lan' network.lan.metric='7' network.lan.label='toMesh' network.lan.defaultroute='0' network.lan.peerdns='0' network.lan.ip6assign='60' network.lan.ipv6='0' network.lan.force_link='1' network.lan_rule=rule network.lan_rule.lookup='lan' network.lan_rule.priority='100' network.wan1=interface network.wan1.device='eth1' network.wan1.ip4table='wan' network.wan1.defaultroute='0' network.wan1.addlatency='0' network.wan1.ipv6='0' network.wan1.peerdns='0' network.wan1.proto='dhcp' network.wan1.label='starlink' network.wan1.force_link='1' network.wan1.metric='3' network.wan1.multipath='master' network.wan1_dev=device network.wan1_dev.name='eth1' network.wan1_dev.txqueuelen='20' network.wan2=interface network.wan2.device='eth2' network.wan2.ip4table='wan' network.wan2.defaultroute='0' network.wan2.addlatency='0' network.wan2.ipv6='0' network.wan2.peerdns='0' network.wan2.proto='dhcp' network.wan2.force_link='1' network.wan2.metric='4' network.wan2.label='lte' network.wan2.multipath='on' network.wan2_dev=device network.wan2_dev.name='eth2' network.wan2_dev.txqueuelen='20' network.omrvpn=interface network.omrvpn.device='tun0' network.omrvpn.ip4table='vpn' network.omrvpn.multipath='off' network.omrvpn.leasetime='12h' network.omrvpn.type='tunnel' network.omrvpn.txqueuelen='100' network.omrvpn.metric='1200' network.omrvpn.proto='none' network.omr6in4=interface network.omr6in4.proto='6in4' network.omr6in4.ip4table='vpn' network.omr6in4.multipath='off' network.omr6in4.ipaddr='10.255.255.2' network.omr6in4.peeraddr='10.255.255.1' network.omr6in4.ip6addr='fe80::a00:2/126' network.omr6in4.gateway='fe80::a00:1/126' network.omr6in4.force_link='1' network.omr6in4.metric='1201' network.lan_dev=device network.lan_dev.name='eth0' network.wan3_dev=device network.wan3_dev.type='macvlan' network.wan3_dev.mode='vepa' network.wan3_dev.ifname='eth0' network.wan3_dev.name='wan3' network.wan3_dev.txqueuelen='20' network.localNIC=interface network.localNIC.proto='static' network.localNIC.device='eth3' network.localNIC.netmask='255.255.255.0' network.localNIC.multipath='off' network.localNIC.addlatency='0' network.localNIC.metric='11' network.localNIC.ipaddr='192.168.y.y' network.localNIC.ipv6='0' network.localNIC.defaultroute='0' network.localNIC.peerdns='0' network.localNIC_dev=device network.localNIC_dev.name='eth3' network.@route[2]=route network.@route[2].interface='lan' network.@route[2].target='192.168.a.a' network.@route[2].netmask='255.255.255.0' network.@route[2].gateway='192.168.x.xyy'
I would need system log from the router, so the result of
logread
.
Here is the log from the moment I clicked reset on the wan2 interface. After this I just see expected lease renewal messages, nothing else related to connections and the log is quiet.
Thu Aug 31 14:13:42 2023 daemon.notice netifd: Interface 'wan2' is now down
Thu Aug 31 14:13:42 2023 kern.info kernel: [ 3351.120795] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Thu Aug 31 14:13:42 2023 daemon.notice netifd: Interface 'wan2' is disabled
Thu Aug 31 14:13:42 2023 daemon.notice netifd: Interface 'wan2' is enabled
Thu Aug 31 14:13:42 2023 daemon.notice netifd: Interface 'wan2' is setting up now
Thu Aug 31 14:13:42 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Aug 31 14:13:42 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Aug 31 14:13:42 2023 daemon.notice netifd: wan2 (29374): udhcpc: started, v1.33.2
Thu Aug 31 14:13:42 2023 daemon.notice netifd: wan2 (29374): udhcpc: sending discover
Thu Aug 31 14:13:42 2023 kern.err kernel: [ 3351.487765] __mptcp_init4_subsockets: token 0xa463fb6a bind() to y.y.y.y index 12 failed, error -99
Thu Aug 31 14:13:43 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 14:13:44 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched off because wan2 may have ip issues, interface have no IPv4, interface have no IPv4 gateway
Thu Aug 31 14:13:44 2023 user.notice post-tracking-post-tracking: Delete default route to x.x.x.x via dev eth2
Thu Aug 31 14:13:45 2023 daemon.notice netifd: wan2 (29374): udhcpc: sending discover
Thu Aug 31 14:13:46 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Aug 31 14:13:46 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Aug 31 14:13:46 2023 kern.info kernel: [ 3354.700767] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Thu Aug 31 14:13:49 2023 daemon.notice netifd: wan2 (29374): udhcpc: sending discover
Thu Aug 31 14:13:52 2023 daemon.notice netifd: wan2 (29374): udhcpc: sending select for y.y.y.y
Thu Aug 31 14:13:52 2023 daemon.notice netifd: wan2 (29374): udhcpc: lease of y.y.y.y obtained, lease time 172800
Thu Aug 31 14:13:52 2023 daemon.notice netifd: Interface 'wan2' is now up
Thu Aug 31 14:13:52 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Aug 31 14:13:52 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Aug 31 14:13:52 2023 user.notice firewall: Reloading firewall due to ifup of wan2 (eth2)
Thu Aug 31 14:13:52 2023 user.notice firewall.omr-server: Firewall reload, set server part firewall reloading
Thu Aug 31 14:13:53 2023 user.notice post-tracking-post-tracking: Set firewall on server vps
Thu Aug 31 14:13:54 2023 daemon.err glorytun[14226]: read: Operation timed out
Thu Aug 31 14:13:54 2023 daemon.info glorytun[14226]: STOPPED tun0
Thu Aug 31 14:13:55 2023 daemon.err glorytun[14226]: x.x.x.x.65001: connected
Thu Aug 31 14:13:55 2023 daemon.info glorytun[14226]: STARTED tun0
Thu Aug 31 14:13:55 2023 user.notice MPTCP: Add route z.z.z.z/24 via z.z.z.b dev eth2
Thu Aug 31 14:13:55 2023 user.notice MPTCP: Add route a.a.a.a via a.a.a.b dev eth0
Thu Aug 31 14:13:55 2023 user.notice mptcp: Reloading mptcp config due to ifup of wan2 (eth2)
Thu Aug 31 14:13:55 2023 user.notice MPTCP: Add route z.z.z.z/24 via z.z.z.b dev eth2
Thu Aug 31 14:13:56 2023 user.notice MPTCP: Add route a.a.a.a via a.a.a.b dev eth0
Thu Aug 31 14:13:56 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Aug 31 14:13:56 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Aug 31 14:13:56 2023 kern.info kernel: [ 3364.671214] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Thu Aug 31 14:13:56 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 14:13:56 2023 user.notice MPTCP: Add route z.z.z.z/24 via z.z.z.b dev eth2
Thu Aug 31 14:13:56 2023 user.notice MPTCP: Add route a.a.a.a via a.a.a.b dev eth0
Thu Aug 31 14:13:56 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 14:13:58 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched up
Thu Aug 31 14:14:01 2023 user.notice MPTCP: Add route z.z.z.z/24 via z.z.z.b dev eth2
Thu Aug 31 14:14:01 2023 user.notice MPTCP: Add route a.a.a.a via a.a.a.b dev eth0
Thu Aug 31 14:14:45 2023 user.notice post-tracking-post-tracking: Check API configuration...
Thu Aug 31 14:14:53 2023 user.notice post-tracking-post-tracking: Check API configuration... Done
After this point wan2 does not recover:
/etc/init.d/vray restart
will fix the problem and allow upload traffic to use both wan1 and wan2 but this causes other problems because it kills all active connection on the network.Can you try to do multipath eth2 off && multipath eth2 on
to check if this change something ?
multipath eth2 off && multipath eth2 on
multipath eth2 off && multipath eth2 on
Result: I cannot see any effect from this command.
There was one error after i reset wan2:
Thu Aug 31 14:13:42 2023 kern.err kernel: [ 3351.487765] __mptcp_init4_subsockets: token 0xa463fb6a bind() to y.y.y.y index 12 failed, error -99
Could this be the problem it happens when wan2 is brought down.
I am going to reboot OMR to a known good state and watch the logs closely again.
Here is a log with two question I have.
Thu Aug 31 16:14:21 2023 kern.info kernel: [ 1079.764399] igc 0000:04:00.0: changing MTU from 1500 to 1460
and this change causes eth2 to go down and up:
Thu Aug 31 16:14:21 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Aug 31 16:14:21 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Aug 31 16:14:25 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Aug 31 16:14:25 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Aug 31 16:14:25 2023 kern.info kernel: [ 1083.381921] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Is there an OMR process that is monitoring something and trying to optimize MTU and changes it while the interface is up and running? The consequence is that it restarts the interface after and then the interface isn't used for uploads because of this issue.
Thu Aug 31 16:02:06 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 16:02:06 2023 user.notice MPTCP: Add route 10.x.x.x/24 via 10.x.x.y dev eth2
Thu Aug 31 16:02:06 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Aug 31 16:02:07 2023 user.notice omr-bypass: OMR-ByPass is running
Thu Aug 31 16:02:07 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Aug 31 16:02:07 2023 user.notice omr-tracker: Launching...
Thu Aug 31 16:02:14 2023 user.notice omr-tracker: Launched
Thu Aug 31 16:02:15 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Sometimes I see these message every few minutes and sometimes I do not see them for 1 hour in the log. What are these for and what determines how often they will run? Is this normal?
Also, is glorytun actually retarting everytime this is logged:
Thu Aug 31 16:02:07 2023 daemon.info glorytun: starting glorytun vpn instance vpn
I rebooted OMR at 17:00, same issue, the kernel or some OMR process changed the MTU again at 17:13:
Thu Aug 31 17:14:35 2023 kern.info kernel: [ 966.328211] igc 0000:04:00.0: changing MTU from 1500 to 1460
Thu Aug 31 17:14:36 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Aug 31 17:14:36 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Aug 31 17:14:39 2023 user.notice post-tracking-post-tracking: Check API configuration...
Thu Aug 31 17:14:39 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Aug 31 17:14:39 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Aug 31 17:14:39 2023 kern.info kernel: [ 969.825546] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
and then the log file is quite, no omr-tracker or glorytun log messages for 2 hours.
What is the purpose of resetting the MTU and why don't I see the omr-tracking, glorytun, and mtcp flush cache messages any longer?
As soon as I initiate 1 file upload, these messages appear again, every few minutes. Are they related to connections or file transfers? Are they related to this issue?
Thu Aug 31 19:29:27 2023 user.notice omr-tracker: Launching...
Thu Aug 31 19:29:34 2023 user.notice omr-tracker: Launched
Thu Aug 31 19:29:34 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Aug 31 19:29:34 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 19:29:34 2023 user.notice MPTCP: Add route 10.a.a.a/24 via 10.0.0.1 dev eth2
Thu Aug 31 19:29:34 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Aug 31 19:29:35 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Thu Aug 31 19:29:35 2023 user.notice omr-bypass: OMR-ByPass is running
Thu Aug 31 19:29:35 2023 user.notice omr-tracker: Launching...
Thu Aug 31 19:29:42 2023 user.notice omr-tracker: Launched
Thu Aug 31 19:29:42 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Aug 31 19:29:42 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 19:29:43 2023 user.notice MPTCP: Add route 10.a.a.a/24 via 10.0.0.1 dev eth2
Thu Aug 31 19:29:43 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Aug 31 19:29:43 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Thu Aug 31 19:29:43 2023 user.notice omr-bypass: OMR-ByPass is running
Thu Aug 31 19:29:51 2023 user.notice omr-tracker: Launching...
Thu Aug 31 19:29:58 2023 user.notice omr-tracker: Launched
Thu Aug 31 19:29:58 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Aug 31 19:29:58 2023 user.notice MPTCP: Flush route cache
Thu Aug 31 19:29:58 2023 user.notice MPTCP: Add route 10.a.a.a/24 via 10.0.0.1 dev eth2
Aug 31 19:29:58 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Aug 31 19:29:59 2023 user.notice omr-bypass: OMR-ByPass is running
Thu Aug 31 19:29:59 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
You can fix MTU in Network->Interfaces for each interfaces, then it will not try to calculate MTU (but changing MTU should not put the interface down).
You can fix MTU in Network->Interfaces for each interfaces, then it will not try to calculate MTU (but changing MTU should not put the interface down).
Are you sure? I always see a reset after an MTU change:
I get this:
Fri Sep 1 01:08:08 2023 kern.info kernel: [ 3654.871161] igc 0000:04:00.0: changing MTU from 1500 to 1460
Fri Sep 1 01:08:08 2023 daemon.notice netifd: Network device 'eth2' link is down
Fri Sep 1 01:08:08 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Fri Sep 1 01:08:11 2023 daemon.notice netifd: Network device 'eth2' link is up
Fri Sep 1 01:08:11 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Fri Sep 1 01:08:11 2023 kern.info kernel: [ 3658.468358] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
I can also reproduce it with this:
ip link set dev eth0 mtu 1460
it will always reset the interface.
Fri Sep 1 01:08:08 2023 daemon.notice netifd: Network device 'eth0' link is down
but this MTU just contributes to the problem, it doesn't fix the root cause of aggregate broken after interface restarted.
What about this error error after i reset wan2:
Thu Aug 31 14:13:42 2023 kern.err kernel: [ 3351.487765] __mptcp_init4_subsockets: token 0xa463fb6a bind() to y.y.y.y index 12 failed, error -99
could this be causing the issue or is this just from the interface going down?
Seems that changing MTU put interface down in some case, I will change the code. The mptcp error is due to interface down yes. I will try to reproduce the issue by using DHCP on interfaces.
I am looking closely at the logs. When I reboot OMR I see these errors related to v2ray, could any of these cause the issue?
Fri Sep 1 12:40:06 2023 daemon.notice procd: /etc/rc.d/S99v2ray: ss-rules6: unknown option def
Fri Sep 1 12:40:06 2023 user.notice v2ray: Reload omr-bypass rules
Fri Sep 1 12:40:06 2023 daemon.notice procd: /etc/rc.d/S99v2ray: iptables-restore: line 2 failed
Fri Sep 1 12:40:06 2023 daemon.notice procd: /etc/rc.d/S99v2ray: iptables-restore: line 2 failed
Fri Sep 1 12:40:06 2023 daemon.notice procd: /etc/rc.d/S99v2ray: iptables-restore: line 2 failed
Fri Sep 1 12:40:06 2023 daemon.notice procd: /etc/rc.d/S99v2ray: iptables-restore: line 2 failed
Fri Sep 1 12:40:06 2023 daemon.notice procd: /etc/rc.d/S99v2ray: iptables-restore: line 2 failed
Fri Sep 1 12:40:07 2023 daemon.notice procd: /etc/rc.d/S99v2ray: Warning: Section 'zone_vpn' cannot resolve device of network 'omr6in4'
Fri Sep 1 12:40:07 2023 daemon.notice procd: /etc/rc.d/S99v2ray: Warning: Option @redirect[3].v2ray is unknown
Fri Sep 1 12:40:07 2023 daemon.notice procd: /etc/rc.d/S99v2ray: Warning: Option @redirect[4].v2ray is unknown
Fri Sep 1 12:40:07 2023 daemon.notice procd: /etc/rc.d/S99v2ray: Warning: Section @redirect[4] (19223 on v2ray) does not specify a protocol, assuming TCP+UDP
Fri Sep 1 12:40:07 2023 daemon.info v2ray[26149]: V2Ray 4.43.0 (V2Fly, a community-driven edition of V2Ray.) OpenWrt R1 (go1.17.3 linux/amd64)
Fri Sep 1 12:40:07 2023 daemon.info v2ray[26149]: A unified platform for anti-censorship.
Fri Sep 1 12:40:07 2023 daemon.info v2ray[26149]: 2023/09/01 16:40:07 [Info] main/jsonem: Reading config: /var/etc/v2ray/v2ray.main.json
The redirect 3 and 4 are the port forwards. At the end of the boot however aggregate does work so maybe this is normal?
From all my testing is seems aggregate uploads stop working always after an interface reset so booting OMR or reloading v2ray fixes it but there are several automated processes like MTU or omr-tracker which reset the interface.
I manually set MTU so that will prevent DHCP from restarting the interfaces. Is there anything I can do to prevent omr-tracker or restart, I don't want to disable it because it is necessary right?
Is there anything else I can adjust or configure to prevent automated interface restarts?
The errors doesn't cause issues. For omr-tracker, you can change settings in Services->OMR-Tracker. OMR-Tracker is necessary to set routes, check services, get some connection info,... The problem is more why connection are no more in V2ray in your case.
Strange event happened tonight.
Sep 05 20:34:53 vps glorytun-tcp-run[7114]: read: Connection timed out
Sep 05 20:34:54 vps glorytun-tcp-run[7114]: ::ffff:35.x.x.x.59324: connected
Sep 05 20:34:54 vps glorytun-tcp-run[7114]: bad packet [E210xxxxxxxxxxxxxxxxx] !
Sep 05 20:34:54 vps glorytun-tcp-run[7114]: ::ffff:35.x.x.x.59324: key exchange failed
Sep 05 20:35:04 vps glorytun-tcp-run[7114]: ::ffff:a.a.a.a.24919: connected
This ip starting with 35.x.x.x, its a Google cloud
ip form the UK. I am not close to this location, is this a hack attempt? did someone try to connect to my glorytun from the vps from this 35.x.x.x ip address?
After this, eth2 goes down:
Tue Sep 5 20:35:03 2023 daemon.err glorytun[14750]: read: Operation timed out
Tue Sep 5 20:35:03 2023 daemon.info glorytun[14750]: STOPPED tun0
Tue Sep 5 20:35:04 2023 daemon.err glorytun[14750]: vpsIP.65001: connected
Tue Sep 5 20:35:04 2023 daemon.info glorytun[14750]: STARTED tun0
Tue Sep 5 20:35:21 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and ping from 10.x.x.x error (4.2.2.1,8.8.8.8,80.67.169.12)
Tue Sep 5 20:35:21 2023 user.notice post-tracking-post-tracking: Delete default route to vpsip via 10.x.x.x dev eth2
Tue Sep 5 20:42:43 2023 user.notice omr-bypass: Starting OMR-ByPass...
Tue Sep 5 20:42:44 2023 user.notice omr-bypass: OMR-ByPass is running
Tue Sep 5 20:42:44 2023 user.notice MPTCP: Flush route cache
Tue Sep 5 20:42:44 2023 user.notice MPTCP: Add route 10.0.0.0/24 via 10.x.x.x dev eth2
Tue Sep 5 20:42:44 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Tue Sep 5 20:42:44 2023 user.notice omr-tracker: Launching...
Tue Sep 5 20:42:51 2023 user.notice omr-tracker: Launched
Tue Sep 5 20:42:52 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Tue Sep 5 20:43:06 2023 daemon.notice netifd: Network device 'eth2' link is down
Tue Sep 5 20:43:06 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Tue Sep 5 20:43:06 2023 kern.info kernel: [176347.654817] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Down
Tue Sep 5 20:43:13 2023 daemon.notice netifd: Network device 'eth2' link is up
Tue Sep 5 20:43:13 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Tue Sep 5 20:43:13 2023 kern.info kernel: [176354.494963] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Tue Sep 5 20:43:17 2023 user.notice post-tracking-post-tracking: Interface route not yet set, set route ip r add default via 10.x.x.x dev eth2 metric 4
Tue Sep 5 20:43:18 2023 user.notice post-tracking-post-tracking: New public ip detected for wan2 (eth2): b.b.b.b
OMR does not recover and bring eth2 back online until I manually restart the interface on Network-> Interfaces page. Should OMR be able to recover from this type of even without user intervention? I waited 8 minutes before I restarted the interface.
After this, I see the typical bug, traffic only one the other wan. I restart v2ray on the vps: systemctl restart v2ray.service
and it is back to normal.
Questions:
systemctl restart v2ray.service
when an interface goes down and is restored. Its not perfect solution but its better than leaving OMR is single mode or rebooting OMR manually. What is the best way to do this? Which script should I add this code too and where in the script should I add the line?I see two more strange log entries on the vps:
167.x.x.x IP address trying to connect to v2ray?
Sep 04 14:16:11 vps v2ray[823534]: 2023/09/04 14:16:11 167.2x.xx.xx:55732 rejected proxy/vless/encoding: failed to read request version > read tcp 172.16.0.7:65228->167.2x.xx.xx:55732: read: connection reset by peer
Sep 04 14:16:12 vps v2ray[823534]: 2023/09/04 14:16:12 167.2x.xx.xx:51324 rejected proxy/vless/encoding: invalid request version
Sep 04 14:16:12 vps v2ray[823534]: 2023/09/04 14:16:12 167.2x.xx.xx:42920 rejected proxy/vless/encoding: failed to read request version > read tcp 172.16.0.7:65228->167.2x.xx.xx:42920: read: connection reset by peer
Sep 04 14:16:13 vps v2ray[823534]: 2023/09/04 14:16:13 167.2x.xx.xx:48674 rejected proxy/vless/encoding: failed to read request version > tls: first record does not look like a TLS handshake
Sep 04 14:16:14 vps v2ray[823534]: 2023/09/04 14:16:14 167.2x.xx.xx:59622 rejected proxy/vless/encoding: failed to read request version > read tcp 172.16.0.7:65228->167.2x.xx.xx:59622: read: connection reset by peer
Another 167 IP (different) trying to connec to shadowsocks:
Sep 05 11:50:46 vps ss-server[809]: server recv: Connection reset by peer
Sep 05 11:50:46 vps ss-server[809]: failed to handshake with 167.9x.xx.xx: authentication error
Sep 05 11:50:49 vps ss-server[809]: server recv: Connection reset by peer
Sep 05 11:50:53 vps ss-server[809]: server recv: Connection reset by peer
Sep 05 11:50:53 vps ss-server[809]: server recv: Connection reset by peer
What do these logs mean, can outside IP addresses connect to shadowsocks, v2ray and glorytun servers running on the VPS or is this just normal log behavior?
Yes these IP are some attack, often a scan done on all ports. Packet is send but without the key and even without the good protocol they are rejected. Nothing to worry about.
OMR should be able to recover an interface but in your case it seems the interface doesn't answer correctly before an up and down. There is an option in Services->OMR-Tracker to "Restart if down" an interface that do exactly the same as "Restart" button in Network->Interfaces.
When an interface is down on the router you restart v2ray on the VPS ? Possible to restart v2ray on router but not on VPS side when an interface is down.
Yes these IP are some attack, often a scan done on all ports. Packet is send but without the key and even without the good protocol they are rejected. Nothing to worry about.
Yes I understand this without the correct auth they are rejected but can this type of attack cause a problem? Eth2 went down after the glorytun-tcp-run bad packet. Bad packet: 20:34:54 tun0 restarted: 20:35:04 eth2 down: 20:35:21
Did this attack cause eth2 to go down? If so then even if its not effective access it still causes a problem.
OMR should be able to recover an interface but in your case it seems the interface doesn't answer correctly before an up and down.
Is there something I should investigate to get it working correctly or should I just add the check?
There is an option in Services->OMR-Tracker to "Restart if down" an interface that do exactly the same as "Restart" button in Network->Interfaces.
I set this, it looks like it happened again at 3am. No attack this time just this:
Wed Sep 6 03:34:21 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and ping from 10.x.x.x error (80.67.169.40,114.114.114.114,1.1.1.1)
What causes this? Sometimes I see 1 week no down event then 1 day it happens 1 or 2 times. Do I need to adjust ome-tracker, is the ping timeout too low? I have these settings under default: timetout: 5 tries: 3 retry: 2 wait: 0
Which setting should I adjust if I see frequent eth drops?
When an interface is down on the router you restart v2ray on the VPS ? Possible to restart v2ray on router but not on VPS side when an interface is down.
Yes I wanted to ask you about this. Is it better to restart on vps or OMR? If I restart on vps it seems not too disruptive. I can have a file transfer running and I see the bandwidth dips but doesn't actually terminate the transfer. It does terminate other connections like irc. What is the difference? Is one better than the other? If I restart on OMR, what command do I use? /etc/init.d/v2ray restart
? I can test on OMR if it is better to restart here.
Which script should I modify so that it automatically restarts v2ray after an interface restarts? Is it
Wed Sep 6 03:34:21 2023 user.notice post-tracking-post-tracking
or something else?
If so which function should I add the restart?
The attack is on the VPS, not related with eth2 on the router.
You can increase timeout, tries and retry but it's a connection issue.
Strange that v2ray restart on VPS change something...
You can set to restart v2ray on interface up: uci set openmptcprouter.wan2.script_alert_up='/etc/init.d/v2ray restart'
I found something interesting while looking for a solution. I noticed that v2ray spans a lot of unconnected UDP sockets that never have any data in their sent or received queues on the vps. Here is an example from this command: ss -tuln -p
:
udp UNCONN 0 0 *:19912 *:* users:(("v2ray",pid=1339875,fd=109))
udp UNCONN 0 0 *:37331 *:* users:(("v2ray",pid=1339875,fd=90))
udp UNCONN 0 0 *:11731 *:* users:(("v2ray",pid=1339875,fd=410))
udp UNCONN 0 0 *:43477 *:* users:(("v2ray",pid=1339875,fd=320))
udp UNCONN 0 0 *:27097 *:* users:(("v2ray",pid=1339875,fd=456))
udp UNCONN 0 0 *:18911 *:* users:(("v2ray",pid=1339875,fd=536))
udp UNCONN 0 0 *:44513 *:* users:(("v2ray",pid=1339875,fd=429))
udp UNCONN 0 0 *:56803 *:* users:(("v2ray",pid=1339875,fd=347))
udp UNCONN 0 0 *:43491 *:* users:(("v2ray",pid=1339875,fd=539))
udp UNCONN 0 0 *:41446 *:* users:(("v2ray",pid=1339875,fd=427))
udp UNCONN 0 0 *:26092 *:* users:(("v2ray",pid=1339875,fd=274))
udp UNCONN 0 0 *:28143 *:* users:(("v2ray",pid=1339875,fd=466))
udp UNCONN 0 0 *:19952 *:* users:(("v2ray",pid=1339875,fd=498))
udp UNCONN 0 0 *:63992 *:* users:(("v2ray",pid=1339875,fd=541))
v2ray was restarted 1h20m ago and it currently has 276 of these open,unused sockets which seems high. Is this normal?
I also see a flood of connect message in the journalctl from the vps like this:
Sep 07 00:06:56 vps v2ray[1339875]: 2023/09/07 00:06:56 a.a.a.a:2557 accepted udp:x.x.x.x:28777 [direct] email: openmptcprouter
Sep 07 00:06:56 vps v2ray[1339875]: 2023/09/07 00:06:56 a.a.a.a:17109 accepted udp:x.x.x.x:51415 [direct] email: openmptcprouter
Sep 07 00:06:56 vps v2ray[1339875]: 2023/09/07 00:06:56 a.a.a.a:42446 accepted udp:x.x.x.x:58168 [direct] email: openmptcprouter
Sep 07 00:06:56 vps v2ray[1339875]: 2023/09/07 00:06:56 a.a.a.a:60231 accepted udp:x.x.x.x:49001 [direct] email: openmptcprouter
Sep 07 00:06:56 vps v2ray[1339875]: 2023/09/07 00:06:56 a.a.a.a:3317 accepted udp:x.x.x.x:51035 [direct] email: openmptcprouter
For example for this second (00:06:56) there were 30 entries. Usually its closer to 1-5 every second. Are these journal logs related to the open sockets? I recognize a lot of these IP addresses on the omt-tracker page. Does omr-tracker open up new UDP socket for every connection attempt? Another strange thing about these connection logs is (a.a.a.a) is always wan1, I never see any for wan2. Is this also normal?
My current fix for this single-path aggregate problem is to restart v2ray and incoming transfers on my port use aggregate again. I noticed roughly that I would have to restart every 1 or 2 hours to restore multipath aggregate. Is v2ray becoming overwhelmed by this large amount of open sockets or is this a normal amount? why are they opened and not closed? Why always UDP?
You shouldn't have so many UDP sockets if you don't use UDP too much. What are the destination IPs ? OMR-Tracker only do http request for the proxy part and ping request for wan part if you didn't set it to DNS check. In the log you see the initial wan connection, it's normal. It's a large amount with some random destination port, this seems really strange...
You shouldn't have so many UDP sockets if you don't use UDP too much.
I don't have many this behavior doesn't look right to me.
What are the destination IPs ?
The destination IPs are always the IP address of WAN1. I did this journalctl | grep wan2IP
and I don't see any results.
OMR-Tracker only do http request for the proxy part and ping request for wan part if you didn't set it to DNS check.
Proxy: Enabled
Server tracker: Enabled
Default settings: Enabled Mail Alert Restart if down
OMR VPN: Enabled Server http test Mail alert
There are so many messages in the journalctl that I always have to do something like this journalctl -f -n 50000 | grep -v -e accepted
to view it for general troubleshooting.
Also these show up as warnings, why are connection events warnings: from config.json "loglevel": "warning"
, what is v2ray warning about if these are just connection events?
Why do these requests only go to wan1 and why are there so many of them? Maybe this has something to do with the problem, after a while (around 1-2 hours) v2ray aggregate stops working, it always works after a restart. Also after 1-2 hours there are hundreds of unconnected UDP sockets open.
I restarted v2ray 30 minutes ago and I am already at 242 sockets open:
sudo ss -tuln -p | grep v2ray | wc -l
253
There are all like this on different port numbers:
udp UNCONN 0 0 *:13591 *:* users:(("v2ray",pid=3146957,fd=264))
Here is an example of a legitimate port that I am forwarding:
tcp LISTEN 0 4096 *:19223 *:* users:(("v2ray",pid=3146957,fd=9))
but what is opening all these UDP unconnected sockets?
In the log you see the initial wan connection, it's normal.
I will double check, are you asking for jounrlactl log for wan connection when vps is booted or something else?
Destination IP is WAN IP ? I think it's source IP. Destination IP is x.x.x.x in your journalctl log. OMR-Tracker use http so port 80 for proxy check. For OMR-Tracker connection status, DNS check is set if you change this (default is ping): V2Ray is verbose in warning mode, I changed this in develop script to error mode. As I said, it's normal that you see only one IP, OpenMPTCProuter doesn't do some round robin. You need to find what is opening so much UDP port, maybe something running on a client computer on LAN side.
I just had a chance to test this change we made earlier:
uci set openmptcprouter.wan2.script_alert_up='/etc/init.d/v2ray restart'
it adds this config to /etc/config/openmptcprputer
config interface 'wan2'
option metric '4'
option multipathvpn '0'
option asn 'ROGERS-COM-DPS'
option mptcp_status 'MPTCP enabled'
option multipath 'on'
option mtu '1460'
option manufacturer 'huawei'
option script_alert_up '/etc/init.d/v2ray restart'
option publicip '97.109.191.12'
option state 'up'
option lc '1694116043'
option latency_previous '41'
option latency '30'
but it did not work. eth2 was just restated by omr-tracker, it came back online but v2ray did not restart. Here is the log:
Thu Sep 7 15:46:17 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched off because check error and ping from 10.x.x.x error (9.9.9.9,1.0.0.1,114.114.115.115)
Thu Sep 7 15:46:17 2023 user.notice post-tracking-post-tracking: Delete default route to 1vpsip via 10.0.0.1 dev eth2
Thu Sep 7 15:46:19 2023 user.notice post-tracking-post-tracking: Restart wan2
Thu Sep 7 15:46:19 2023 daemon.notice netifd: wan2 (28088): udhcpc: received SIGTERM
Thu Sep 7 15:46:19 2023 daemon.notice netifd: wan2 (28088): udhcpc: unicasting a release of 10.x.x.x to 10.0.0.1
Thu Sep 7 15:46:19 2023 daemon.notice netifd: wan2 (28088): udhcpc: sending release
Thu Sep 7 15:46:19 2023 daemon.notice netifd: wan2 (28088): udhcpc: entering released state
Thu Sep 7 15:46:19 2023 daemon.notice netifd: wan2 (28088): Command failed: Permission denied
Thu Sep 7 15:46:19 2023 daemon.notice netifd: Interface 'wan2' is now down
Thu Sep 7 15:46:19 2023 kern.info kernel: [88622.022865] igc 0000:04:00.0: changing MTU from 1460 to 1500
Thu Sep 7 15:46:19 2023 daemon.notice netifd: Interface 'wan2' is disabled
Thu Sep 7 15:46:19 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Sep 7 15:46:19 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Sep 7 15:46:20 2023 user.notice MPTCP: Flush route cache
Thu Sep 7 15:46:24 2023 kern.info kernel: [88626.981973] igc 0000:04:00.0: changing MTU from 1500 to 1460
Thu Sep 7 15:46:24 2023 daemon.notice netifd: Interface 'wan2' is enabled
Thu Sep 7 15:46:24 2023 daemon.notice netifd: Interface 'wan2' is setting up now
Thu Sep 7 15:46:24 2023 daemon.notice netifd: wan2 (11044): udhcpc: started, v1.33.2
Thu Sep 7 15:46:24 2023 daemon.notice netifd: wan2 (11044): udhcpc: sending discover
Thu Sep 7 15:46:27 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Sep 7 15:46:27 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Sep 7 15:46:27 2023 kern.info kernel: [88629.977387] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Thu Sep 7 15:46:27 2023 daemon.notice netifd: wan2 (11044): udhcpc: sending discover
Thu Sep 7 15:46:33 2023 daemon.notice netifd: Interface 'wan2' is now up
Thu Sep 7 15:46:33 2023 daemon.notice netifd: Network device 'eth2' link is down
Thu Sep 7 15:46:33 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Thu Sep 7 15:46:33 2023 user.notice firewall: Reloading firewall due to ifup of wan2 (eth2)
Thu Sep 7 15:46:33 2023 user.notice firewall.omr-server: Firewall reload, set server part firewall reloading
Thu Sep 7 15:46:33 2023 user.notice mptcp: Reloading mptcp config due to ifup of wan2 (eth2)
Thu Sep 7 15:46:34 2023 user.notice MPTCP: Flush route cache
Thu Sep 7 15:46:35 2023 user.notice post-tracking-post-tracking: Set firewall on server vps
Thu Sep 7 15:46:37 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Sep 7 15:46:37 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Sep 7 15:46:37 2023 kern.info kernel: [88640.106992] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Thu Sep 7 15:46:38 2023 user.notice MPTCP: Flush route cache
Thu Sep 7 15:46:39 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched up
Thu Sep 7 15:48:05 2023 user.notice MPTCP: Flush route cache
Thu Sep 7 15:48:06 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Sep 7 15:48:06 2023 user.notice omr-bypass: OMR-ByPass is running
Thu Sep 7 15:48:06 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Sep 7 15:48:06 2023 user.notice omr-tracker: Launching...
Thu Sep 7 15:48:13 2023 user.notice omr-tracker: Launched
Thu Sep 7 15:48:14 2023 daemon.info omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
Thu Sep 7 15:48:37 2023 user.notice omr-tracker: Launching...
Thu Sep 7 15:48:44 2023 user.notice omr-tracker: Launched
Thu Sep 7 15:48:44 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Sep 7 15:48:44 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Sep 7 15:48:45 2023 user.notice omr-bypass: OMR-ByPass is running
Thu Sep 7 15:48:45 2023 user.notice MPTCP: Flush route cache
Thu Sep 7 15:48:49 2023 user.notice omr-bypass: Starting OMR-ByPass...
Here is the code that detected wan2 was back up:
Thu Sep 7 15:46:37 2023 daemon.notice netifd: Network device 'eth2' link is up
Thu Sep 7 15:46:37 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Thu Sep 7 15:46:39 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched up
post-tracking-post-tracking
can see wan2 is switched up but /etc/init.d/v2ray restart
never ran. Is this the right place or is there somewhere else I can put this?
Yes, I made a mistake, it's uci set omr-tracker.wan2=interface && uci set omr-tracker.wan2.script_alert_up='/etc/init.d/v2ray restart'
Yes, I made a mistake, it's
uci set omr-tracker.wan2=interface && uci set omr-tracker.wan2.script_alert_up='/etc/init.d/v2ray restart'
Thanks, I will try this new command. Is there a reverse command to delete the old one?
Also, I am seeing some error messages on the vps like this:
Sep 06 00:59:57 vps v2ray[2580939]: 2023/09/06 00:59:57 x.x.x.x:17668 rejected proxy/vless/encoding: failed to read request version > read tcp b.b.b.b:65228->x.x.x.x:17668: read: connection reset by peer
Sep 08 16:30:16 vps v2ray[3697154]: 2023/09/08 16:30:16 x.x.x.x:53988 rejected proxy/vless/encoding: failed to read request version > tls: first record does not look like a TLS handshake
Sep 08 16:30:41 vps v2ray[3697154]: 2023/09/08 16:30:41 x.x.x.x:60627 rejected proxy/vless/encoding: failed to read request version > tls: first record does not look like a TLS handshake
x.x.x.x is the public ip of my wan1 router interface. Are these error typical, what causes then, could they be a problem?
uci del openmptcprouter.wan2.script_alert_up
No idea for the V2Ray errors.
No idea for the V2Ray errors.
Have you never seen this error before? I have another one on the vps, a.a.a.a is wan1. and x.x.x.x the vps IP:
Sep 09 11:21:30 vps v2ray[3697154]: 2023/09/09 11:21:30 a.a.a.a:9035 rejected proxy/vless/encoding: failed to read request version > read tcp x.x.x.x:65228->a.a.a.a:9035: read: connection reset by peer
There is no corresponding error on the router at this time.
What does the message mean, is the v2ray connection between the vps and the router being terminated, is there anyway to troubleshoot this further? There is what I found "This could be caused by various factors such as incorrect configuration files, inconsistent system time or id between the server and client, or blocked v2ray servers"
It seems to happen once per day on average:
Sep 11 15:55:59 vps v2ray[818]: 2023/09/11 15:55:59 y.y.y.y:12679 rejected proxy/vless/encoding: failed to read request version > read tcp x.x.x.x:65228->y.y.y.y:12679: read: connection reset by peer
Can you think of anyway to troubleshoot this? Why is v2ray server rejecting v2ray client on OMR? maybe this is contributing the the problem.
Yes, I made a mistake, it's
uci set omr-tracker.wan2=interface && uci set omr-tracker.wan2.script_alert_up='/etc/init.d/v2ray restart'
I am modifying my new install to restart glorytun and not v2ray when an interface restarts because this fixes aggreegate. It is not working. Here is the new command:
uci set omr-tracker.wan2=interface && uci set omr-tracker.wan2.script_alert_up='/etc/init.d/glorytun restart'
uci show:
omr-tracker.wan2=interface
omr-tracker.wan2.script_alert_up='/etc/init.d/glorytun restart'
when i restart eth2, run logread | grep glorytun
<no results>
This command does not run. It did work when I had the v2ray, why won't this command run to restart glorytun? Is this command correct?
I confirm, aggregate is not automatically restored. I do need to find a way to automatically restart glorytun on an interface restart.
okay it is actually working, my test was wrong, it was restarting the interface, when i did a proper stop and start it did trigger:
Wed Sep 27 16:49:35 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched up
Wed Sep 27 16:49:35 2023 daemon.info glorytun[22860]: STOPPED tun0
Wed Sep 27 16:49:35 2023 daemon.notice netifd: Network device 'tun0' link is down
Wed Sep 27 16:49:35 2023 daemon.notice netifd: Interface 'omrvpn' has link connectivity loss
Wed Sep 27 16:49:35 2023 daemon.notice netifd: Interface 'omrvpn' is now down
Wed Sep 27 16:49:35 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Wed Sep 27 16:49:35 2023 daemon.notice netifd: Interface 'omrvpn' is disabled
Wed Sep 27 16:49:36 2023 daemon.info glorytun[27727]: INITIALIZED tun0
Wed Sep 27 16:49:36 2023 daemon.notice netifd: Interface 'omrvpn' is enabled
Wed Sep 27 16:49:36 2023 daemon.notice netifd: Network device 'tun0' link is up
Wed Sep 27 16:49:36 2023 daemon.notice netifd: Interface 'omrvpn' has link connectivity
Wed Sep 27 16:49:36 2023 daemon.notice netifd: Interface 'omrvpn' is setting up now
Wed Sep 27 16:49:36 2023 daemon.notice netifd: Interface 'omrvpn' is now up
I did not see this before. Aggregate is restored but this is not a great solution it is too disruptive, is there any other way to fix this issue, I am using v2ray for TCP and UDP so why does restarting the glorytun VPN fix aggregate issues with v2ray?
ip route show table 3 and ip route show table 4
is the same when there is no aggregate and when it is working again.
After restarting glorytun I got a kernel panic, first time I have seen this on omr:
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.153871] ------------[ cut here ]------------
Wed Sep 27 16:51:16 2023 kern.crit kernel: [ 5122.154394] Kernel BUG at dql_completed+0x140/0x150 [verbose debug info unavailable]
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.155374] invalid opcode: 0000 [#1] SMP NOPTI
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.155837] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 5.4.194 #0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.156816] Hardware name: redacted
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.157748] RIP: 0010:dql_completed+0x140/0x150
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.158212] Code: f8 ea 00 48 89 77 58 e9 4d ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95 c0 41 84 c4 74 05 45 85 ed 78 09 89 c8 e9 2f ff ff ff <0f> 0b 01 f6 89 c8 29 f0 41 0f 48 c2 eb 8d 90 90 55 48 89 e5 41 56
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.159706] RSP: 0018:ffffc90000003e38 EFLAGS: 00010287
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.160196] RAX: 0000000000000002 RBX: ffffc90004a05d20 RCX: 0000000000000056
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.160660] RDX: 000000000000004a RSI: 00000000000000a0 RDI: ffff888173817340
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.161114] RBP: ffffc90000003e50 R08: 0000000000000000 R09: 0000000000017389
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.161557] R10: 0000000000000000 R11: 000000000001733f R12: ffff8881752daa80
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.161991] R13: ffff888173817280 R14: ffff888173135460 R15: 00000000ffffff46
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.162424] FS: 0000000000000000(0000) GS:ffff88817c000000(0000) knlGS:0000000000000000
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.163363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.163790] CR2: 0000560d17df9000 CR3: 0000000003208000 CR4: 0000000000340ef0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.164216] Call Trace:
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.164628] <IRQ>
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.165084] igc_del_mac_steering_filter+0x4e1/0x2470 [igc]
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.165512] __napi_poll+0x2c/0xb0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.165908] net_rx_action+0xca/0x1f0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.166304] __do_softirq+0xc9/0x20e
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.166694] irq_exit+0x6f/0x80
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.167072] do_IRQ+0x57/0xe0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.167444] common_interrupt+0xf/0xf
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.167807] </IRQ>
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.168159] RIP: 0010:cpuidle_enter_state+0x137/0x2c0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.168535] Code: 89 c6 e8 7c 0f 9f ff 45 84 ff 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 67 01 00 00 31 ff e8 40 fa a2 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 c5 00 00 00 49 63 cd 4c 2b 75 d0 48 8d 04 49 48 c1
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.169694] RSP: 0018:ffffffff82203e08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd7
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.170427] RAX: ffff88817c0246c0 RBX: ffffffff822550e0 RCX: 000000000000001f
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.170881] RDX: 000004a898279135 RSI: 00000000401a41a4 RDI: 0000000000000000
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.171286] RBP: ffffffff82203e38 R08: 0000000000000000 R09: 0000000000023f40
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.171678] R10: 000009513d583da4 R11: ffffc90005627cf8 R12: ffff88817a8ac400
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.172078] R13: 0000000000000003 R14: 000004a898279135 R15: 0000000000000000
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.172440] ? cpuidle_enter_state+0x114/0x2c0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.172800] cpuidle_enter+0x2f/0x40
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.173157] call_cpuidle+0x1e/0x40
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.173516] do_idle+0xf4/0x1c0
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.173861] cpu_startup_entry+0x18/0x20
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.174201] rest_init+0xa4/0xa6
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.174562] arch_call_rest_init+0x9/0xc
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.174885] start_kernel+0x57a/0x588
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.175200] x86_64_start_reservations+0x29/0x2b
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.175511] x86_64_start_kernel+0x71/0x74
Wed Sep 27 16:51:16 2023 kern.warn kernel: [ 5122.175824] secondary_startup_64+0xa4/0xb0
not sure what the implications of this are.
Here is a real world example. eth2 went down:
Wed Sep 27 18:32:56 2023 daemon.notice netifd: Network device 'eth2' link is down
Wed Sep 27 18:32:56 2023 daemon.notice netifd: Interface 'wan2' has link connectivity loss
Wed Sep 27 18:32:56 2023 kern.info kernel: [ 6087.417383] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Down
Wed Sep 27 18:33:03 2023 daemon.notice netifd: Network device 'eth2' link is up
Wed Sep 27 18:33:03 2023 daemon.notice netifd: Interface 'wan2' has link connectivity
Wed Sep 27 18:33:03 2023 kern.info kernel: [ 6094.447380] igc 0000:04:00.0 eth2: igc: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Wed Sep 27 18:33:04 2023 user.notice post-tracking-post-tracking: wan2 (eth2) switched up
Wed Sep 27 18:33:04 2023 user.notice post-tracking-post-tracking: Interface route not yet set, set route ip r add default via x.x.0.1 dev eth2 metric 4
Wed Sep 27 18:33:06 2023 user.notice post-tracking-post-tracking: New public ip detected for wan2 (eth2): x.x.x.x
Wed Sep 27 18:33:06 2023 user.notice post-tracking-post-tracking: Reload MPTCP for eth2
but the this did not run:
omr-tracker.wan2=interface
omr-tracker.wan2.script_alert_up='/etc/init.d/glorytun restart'
I had to run this manually, then it returned to aggreate: /etc/init.d/glorytun restart.
Is there another way to get glorytun to restart when an interface goes down and up again automatically?
The problem is that this command does not persist across reboots:
uci set omr-tracker.wan2=interface && uci set omr-tracker.wan2.script_alert_up='/etc/init.d/glorytun restart'
the value is not written to /etc/config/omr-tracker
this setting does work, how can i get it to persist?
Expected Behavior
Multipath should be restored to the same state after omr-tracker stops and starts an interface with all wan connections used.
Current Behavior
Multipath is broken after omr-tracker restarts an interface, traffic now only uses one wan, OpenMPTCProuter becomes OpenSPTCProuter (single path TCP router).
Steps to Reproduce the Problem
Port forwarding is using v2ray and is working as expected, traffic uses both wan1 and wan2 interfaces.
Start a file upload.
Observe o the bandwidth page that OMR is correctly using all wan connections (wan1 and wan2): https://ibb.co/2YZDzBB
OMR-tracker switches wan2 interface off (due to ping or error):
OMR-tracker switches wan2 interface back on:
Start a new upload, traffic now only uses wan1, router is not Multi-path any longer: https://ibb.co/94QZJLq
Router never recovers from this state. Start another upload 1 hour later and it still only uses wan1.
ip r shows there are routes and default routes for both wan1 and wan2 but MPTCP refuses to use wan2 after OM-tracker restarts it.
Possible Solution 1
I tried two steps to fix the problem:
/etc/init.d/openmptcprouter-vps restart
logread -f
on OMR andjournalctl -f
on VPS I do not see any log events after this comamnd! This command executed and exits but it didn't do anything observable from the logsPossible Solution 2:
Save and Apply
from the wizard page.Thu Aug 24 18:01:47 2023 user.notice post-tracking-post-tracking: Reload MPTCP for eth2
I guess there is where the bug is, Reload MPTCP is not properly restoring the MPTCP bond. Is there a way to get more information on what MPTCP is doing here, is there any debug mode?I have been tracking this problem for a long time where I see the performance of the system degrade over time. I didn't have the knowledge until recently to isolate the bug and report it. I am available to test any solutions.
Context (Environment)
The issue is bad because it effectively breaks OMR. If only a single path is used after OMR-tracker then there is no purpose to run OMR at all. Also there is no way currently to recover from this problem.
Specifications