Closed ioogithub closed 10 months ago
I first encountered the issue here : Originally posted by @ioogithub in https://github.com/Ysurac/openmptcprouter/issues/2931#issuecomment-1689176023
@Ysurac, you mentioned that this script should fix the issue: Originally posted by @Ysurac in https://github.com/Ysurac/openmptcprouter/issues/2931#issuecomment-1690148743
I will test it and see if it works. If it does, can this be run automatically when OMR switches from direct output back to vps output so that OMR can failover and recover itself properly without any manual user intervention?
I can't reproduce the issue in latest snapshot. The comment was on traffic from VPS to router side, if I understand correctly, here it's traffic from router to VPS side.
The comment was on traffic from VPS to router side, if I understand correctly, here it's traffic from router to VPS side.
I am not sure what you mean VPS to router or router to VPS. From the issue, it is when VPS is restarted or connection problem. on Router I see it switches to direct connection (wan1). After VPS comes back up it switched back to aggreate but using VPN (glorytun) only, not using proxy.
I will test it again now and try to recover with /etc/init.d/openmptcprouter-vps restart
If this works then at least there is a manual way to recover from this issue. Which script is responsible for restoring the proxy connection after VPN comes back up?
I can reproduced this twice. Look at the proxy and VPN values after restarting VPS in steps 5,6,7 to see the problem:
OMR working normally, green on status page, TCP and UDP traffic uses proxy, ICMP other small stuff uses VPN:
sudo restart now
on VPS
OMR detects VPS is down
OMR switches to direct mode
OMR detects VPS is back up. All traffic is now using VPN only, run a speed test, look at proxy and VPN values:
run /etc/init.d/openmptcprouter-vps restart
Test with speedtest, look at proxy and VPN traffic:
Save and Apply on wizard page, run speedtest, look at VPN and proxy values:
I tested this twice, it is reproducible. It would be better for OMR to stay in Direct output mode rather than changing to VPN only because internet is unusable with glorytun, less than 1Mb/s. OMR will stay in this crippled state until the user manually restarts OMR or validates wizard.
What exact script does Save and Apply run. This does not work: /etc/init.d/openmptcprouter-vps restart
I want to try to recover from this using a command to see if I can fix the issue automatically.
As I said, I can't reproduce this on snapshot, so this will be fixed in next release.
As I said, I can't reproduce this on snapshot, so this will be fixed in next release.
What about now, this issue exists in the latest stable release. I understand you only want to to develop the new alpha versions but the last stable release had this problem and its a big problem. OMR doesn't recover from a vps connection issue and leaves the router in an unusable state. The only solution is for the user to manually restart it.
I have time to troubleshoot the issue now. I can do tests and analyze logs but can you give me some basic information? You said "The openmptcprouter-vps script synchronize the router and VPS config. /etc/init.d/openmptcprouter-vps restart
" in the other issue but it did not work. When I run this script I get no results in the log.
What script runs when I click Save and Apply on the wizard page? This is the only know way to fix the problem and it always works. If I know this, then I can find a solution to automatically fix the problem.
After Save and apply, all services are restarted. You can try a /etc/init.d/v2ray restart
if this don't work, try a /etc/init.d/omr-tracker restart
I tried restarting v2ray already it did not work.
Maybe the problem is this: If VPN is down OMR switches all traffic to single wan. When VPS is restored, OMR switches all traffic to glorytun and does not switch back to proxy. OMR status shows proxy is running but does not want to use it.
During normal operation proxy and VPN are running at the same time. What tells OMR to use the proxy and not use the VPN or TCP and UDP traffic? Routing rules or something else?
This is what I have tried so far, did not work:
/etc/init.d/openmptcprouter-vps restart
/etc/init.d/openmptcprouter-vps restart
/etc/init.d/v2ray restart
I will try this next:
/etc/init.d/omr-tracker restart
/etc/init.d/glorytun restart
Thank you.
A list of all restarted services can be found here: https://github.com/Ysurac/openmptcprouter-feeds/blob/v0.59.1/luci-app-openmptcprouter/luasrc/controller/openmptcprouter.lua#L1009
Thank you. This list is very helpful I will try to isolate the problem.
In your test on snapshot, did you wait for OMR to failover to direct connection? I think this is the key. I just did a vps restart and it restarted very quickly, OMR didn't actually switchover to single wan direct connection. On the status page I never saw direct connection. In this test when VPS came back up it was still using proxy.
In the other tests, VPS took longer to restart and OMR switches to direct connection. When it switches back to VPS from direct connection state that is when there is a problem with the proxy traffic.
First test. I ran this command:
/etc/init.d/glorytun restart >/dev/null 2>/dev/null"
here is the log result:
Thu Sep 14 16:28:27 2023 daemon.info glorytun: starting glorytun vpn instance vpn
Thu Sep 14 16:28:27 2023 daemon.err glorytun[10376]: vpsip.65001: shutdown
Thu Sep 14 16:28:27 2023 daemon.info glorytun[10376]: STOPPED tun0
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Network device 'tun0' link is down
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' has link connectivity loss
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' is now down
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' is disabled
Thu Sep 14 16:28:27 2023 daemon.info glorytun[5671]: INITIALIZED tun0
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' is enabled
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Network device 'tun0' link is up
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' has link connectivity
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' is setting up now
Thu Sep 14 16:28:27 2023 daemon.notice netifd: Interface 'omrvpn' is now up
Thu Sep 14 16:28:28 2023 daemon.warn [6974]: <warn> [SQM_IFB_a12cb] invalid sysfs path read for net/SQM_IFB_a12cb
Thu Sep 14 16:28:28 2023 daemon.warn [6974]: <warn> [base-manager] couldn't handle kernel event: device net/SQM_IFB_a12cb not found
Thu Sep 14 16:28:28 2023 daemon.warn [6974]: <warn> [SQM_IFB_c6ec0] invalid sysfs path read for net/SQM_IFB_c6ec0
Thu Sep 14 16:28:28 2023 daemon.warn [6974]: <warn> [base-manager] couldn't handle kernel event: device net/SQM_IFB_c6ec0 not found
Thu Sep 14 16:28:28 2023 user.notice firewall: Reloading firewall due to ifup of omrvpn (tun0)
Thu Sep 14 16:28:28 2023 user.notice firewall.omr-server: Firewall reload, set server part firewall reloading
Thu Sep 14 16:28:28 2023 user.notice mptcp: Reloading mptcp config due to ifup of omrvpn (tun0)
Thu Sep 14 16:28:28 2023 daemon.err glorytun[5671]: vpsip.65001: connected
Thu Sep 14 16:28:28 2023 daemon.info glorytun[5671]: STARTED tun0
Thu Sep 14 16:28:29 2023 user.notice MPTCP: Flush route cache
Thu Sep 14 16:28:29 2023 user.notice post-tracking-post-tracking: Set firewall on server vps
Thu Sep 14 16:28:30 2023 user.notice v2ray: Rules UP
Thu Sep 14 16:28:30 2023 user.notice v2ray: v2ray-rules -l 1897 -L 1897 -s vpsip --rule-name def --src-default forward --dst-default forward --local-default forward
Thu Sep 14 16:28:30 2023 user.notice v2ray: Reload omr-bypass rules
Thu Sep 14 16:28:30 2023 user.notice omr-bypass: Starting OMR-ByPass...
Thu Sep 14 16:28:31 2023 user.notice post-tracking-post-tracking: Tunnel up : Replace default route by 10.255.255.1 dev tun0
Thu Sep 14 16:28:33 2023 user.notice omr-bypass: Reload dnsmasq...
Thu Sep 14 16:28:33 2023 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Thu Sep 14 16:28:33 2023 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg01411c - 3 addresses
Thu Sep 14 16:28:33 2023 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
Thu Sep 14 16:28:33 2023 user.notice omr-bypass: OMR-ByPass is running
I did not stop /etc/init.d/omr-tracker stop >/dev/null 2>/dev/null"
like in the restart all code.
It seemed to work! After this command, traffic starts going out over the v2ray proxy again. I will test again a few times. Any idea why restarting glorytun again at the end will restore traffic flow over the v2ray proxy? I see that restarting glorytun also caused a few other events, are these triggered by omr-tracker?
I think this:
v2ray: Rules UP and v2ray: v2ray-rules -l 1897 -L 1897 -s vpsip --rule-name def --src-default forward --dst-default forward --local-default forward.
is a v2ray reload
I will try a few more tests, just v2ray reload and see if I can reproduce these results.
If I can reproduce it, something needs to trigger a final /etc/init.d/glorytun restart >/dev/null 2>/dev/null"
at the end of the sequence where OMR switches from direct output back to VPS. Where would be the best place to do this automatically?
When it's not working, can you do a iptables -w -t nat -L -n 2>/dev/null | grep v2r
and iptables-save | grep REDIRECT
? To check if iptables rules are here or not.
When it's not working, can you do a
iptables -w -t nat -L -n 2>/dev/null | grep v2r
andiptables-save | grep REDIRECT
? To check if iptables rules are here or not.
Sure I can test this next.
Yesterday I did a lot of testing. Restarting glorytun works every time. When I look at the log after glorytun restart sometimes v2ray reloads rules, sometimes it don't. Strange that restarting glorytun fixes the proxy! Is there a known issue with glorytun in stable version? It shouldn't be so broken, <1mb download when OMR uses glorytun? This is how I found the issue because internet doesn't work. It is so slow I can barely ssh into vps.
I tested these yesterday too, none of them work:
/etc/init.d/glorytun restart >/dev/null 2>/dev/null
/etc/init.d/omr-tracker start >/dev/null 2>/dev/null
/etc/init.d/v2ray restart >/dev/null 2>/dev/null
/etc/init.d/v2ray reload>/dev/null 2>/dev/null
I had lots of logs to post after but kate crashed when I went to save, sorry.
Question: Would it be a good idea to have a script that runs once a day from crontab and restarts everything from your reset script:
env -i /bin/ubus call network reload >/dev/null 2>/dev/null"
ip addr flush dev tun0 >/dev/null 2>/dev/null"
/etc/init.d/omr-tracker stop >/dev/null 2>/dev/null"
/etc/init.d/mptcp restart >/dev/null 2>/dev/null"
/etc/init.d/glorytun restart >/dev/null 2>/dev/null"
/etc/init.d/glorytun-udp restart >/dev/null 2>/dev/null"
/etc/init.d/ubond restart >/dev/null 2>/dev/null"
/etc/init.d/openvpnbonding restart >/dev/null 2>/dev/null"
/etc/init.d/omr-tracker start >/dev/null 2>/dev/null"
/etc/init.d/v2ray restart >/dev/null 2>/dev/null"
so whatever state router is in, it will reset to known good state. Is there any disadvantage to doing something like this?
iptables -w -t nat -L -n 2>/dev/null | grep v2r
andiptables-save | grep REDIRECT
When it is working normally:
-A v2r_def_forward -p tcp -j REDIRECT --to-ports 1897
v2r_def_pre_src tcp -- 0.0.0.0/0 0.0.0.0/0
v2r_def_local_out tcp -- 0.0.0.0/0 0.0.0.0/0
Chain v2r_def_dst (1 references)
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 match-set ssr_def_dst_forward dst
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 /* dst_default: forward */
Chain v2r_def_forward (5 references)
Chain v2r_def_local_out (1 references)
v2r_def_forward tcp -- 0.0.0.0/0 0.0.0.0/0 /* local_default: forward */
Chain v2r_def_pre_src (1 references)
v2r_def_src tcp -- 0.0.0.0/0 0.0.0.0/0
Chain v2r_def_src (1 references)
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 match-set ssr_def_src_forward src
v2r_def_dst all -- 0.0.0.0/0 0.0.0.0/0 match-set ssr_def_src_checkdst src
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 /* src_default: forward */
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:19223 /* !fw3: 19223 on v2ray */ to:192.168.200.167:19223
Okay I have restarted the vpn, the results are very different:
iptables-save | grep REDIRECT
no output
iptables -w -t nat -L -n 2>/dev/null | grep v2r
output:
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:19223 /* !fw3: 19223 on v2ray */ to:192.168.200.167:19223
Router does not recover from this state until it is rebooted or user manually validates the wizard and restarts all the services. Internet is so bad in this state I can't even load a webpage or finish a speedtest here is from fast.com:
Your Internet 420 kbps *Could not reach our servers to perform the test. You may not be connected to the internet
After running /etc/init.d/glorytun restart
:
-A v2r_def_forward -p tcp -j REDIRECT --to-ports 1897
v2r_def_pre_src tcp -- 0.0.0.0/0 0.0.0.0/0
v2r_def_local_out tcp -- 0.0.0.0/0 0.0.0.0/0
Chain v2r_def_dst (1 references)
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 match-set ssr_def_dst_forward dst
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 /* dst_default: forward */
Chain v2r_def_forward (5 references)
Chain v2r_def_local_out (1 references)
v2r_def_forward tcp -- 0.0.0.0/0 0.0.0.0/0 /* local_default: forward */
Chain v2r_def_pre_src (1 references)
v2r_def_src tcp -- 0.0.0.0/0 0.0.0.0/0
Chain v2r_def_src (1 references)
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 match-set ssr_def_src_forward src
v2r_def_dst all -- 0.0.0.0/0 0.0.0.0/0 match-set ssr_def_src_checkdst src
v2r_def_forward all -- 0.0.0.0/0 0.0.0.0/0 /* src_default: forward */
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:19223 /* !fw3: 19223 on v2ray */ to:192.168.200.167:19223
Everything is restored. It looks like you found the problem. This might be an old issue I think I saw it over a year ago when I was trying to get OMR working. I didn't have enough time to troubleshoot so I had to put OMR project on hold for a year.
Currently to fix this issue I added a check to my aggregate script, I look at the bandwidth on tun0, if it is increasing rapidly it means OMR is not using proxy then I restart it with this '/etc/init.d/glorytun restart'. This is not a great solution.
If you have a better solution that uses OMR code, I would be happy to test it.
It's omr-tracker-v2ray job to set firewall config (https://github.com/Ysurac/openmptcprouter-feeds/blob/develop/omr-tracker/files/bin/omr-tracker-v2ray)
When proxy is working, it should do a /etc/init.d/v2ray rules_up
.
When proxy is working, it should do a
/etc/init.d/v2ray rules_up
.
So what do you think is happening here? After VPS comes back online and OMR switches back to VPS, omr-tracker-v2ray don't set the firewall rules. How does omr-tracker-v2ray
know proxy is working? Maybe it runs too soon before v2ray is restored or it thinks proxy is not working so it doesn't set the rules?
/etc/init.d/v2ray rules_up
to see if this fixes the problem? /etc/init.d/v2ray restart
and /etc/init.d/v2ray reload
do not fix the problem but /etc/init.d/glorytun restart
always fixes it, why? does omr-tracker-v2ray
see glorytun is restarted and set the v2ray rules?Actually I think it is the opposite, when I restart VPS and watch OMR logs I see: omr-tracker-v2ray: V2Ray is up (can contact via http 1.0.0.1)
. So maybe it thinks that v2ray is already up and it doesn't bother to run. So it didn't detect that it was down.
omr-tracker-v2ray run in a loop, so it test based on values defined in OMR-Tracker web interface. You should test v2ray rules_up. omr-tracker-v2ray only test v2ray, I think it's omr-tracker that run rules_up when glorytun restart during it's control.
omr-tracker-v2ray check if it can get omr-tracker defined website and if ok and the command iptables -w -t nat... doesn't give result then Reload V2Ray rules.
What is the result of uci show omr-tracker
? Maybe a problem in this configuration.
Here is the results of uci show omr-tracker
omr-tracker.defaults=defaults
omr-tracker.defaults.enabled='1'
omr-tracker.defaults.hosts='4.2.2.1' '8.8.8.8' '80.67.169.12' '8.8.4.4' '9.9.9.9' '1.0.0.1' '114.114.115.115' '1.2.4.8' '80.67.169.40' '114.114.114.114' '1.1.1.1'
omr-tracker.defaults.hosts6='2606:4700:4700::1111' '2606:4700:4700::1001' '2620:fe::fe' '2620:fe::9' '2001:4860:4860::8888' '2001:4860:4860::8844'
omr-tracker.defaults.tries='3'
omr-tracker.defaults.interval='2'
omr-tracker.defaults.interval_tries='1'
omr-tracker.defaults.wait_test='0'
omr-tracker.defaults.server_http_test='0'
omr-tracker.defaults.type='ping'
omr-tracker.defaults.mail_alert='1'
omr-tracker.defaults.timeout='5'
omr-tracker.defaults.restart_down='1'
omr-tracker.proxy=proxy
omr-tracker.proxy.hosts='1.0.0.1' '212.27.48.10' '198.27.92.1' '151.101.129.164' '77.88.55.77' '1.1.1.1' '74.82.42.42' '198.41.212.162'
omr-tracker.proxy.timeout='10'
omr-tracker.proxy.interval_tries='1'
omr-tracker.proxy.interval='5'
omr-tracker.proxy.enabled='1'
omr-tracker.proxy.tries='3'
omr-tracker.proxy.wait_test='0'
omr-tracker.server=server
omr-tracker.server.enabled='1'
omr-tracker.server.tries='3'
omr-tracker.server.timeout='10'
omr-tracker.server.wait_test='0'
omr-tracker.server.interval='5'
omr-tracker.omrvpn=interface
omr-tracker.omrvpn.type='none'
omr-tracker.omrvpn.timeout='10'
omr-tracker.omrvpn.tries='3'
omr-tracker.omrvpn.interval='5'
omr-tracker.omrvpn.enabled='1'
omr-tracker.omrvpn.server_http_test='1'
omr-tracker.omrvpn.restart_down='0'
omr-tracker.omrvpn.hosts='4.2.2.1' '8.8.8.8'
omr-tracker.omrvpn.wait_test='0'
omr-tracker.omrvpn.mail_alert='1'
I will test by running v2ray rules_up next and see if it fixes the problem.
I made the change https://github.com/Ysurac/openmptcprouter-feeds/blob/develop/omr-tracker/files/bin/omr-tracker-v2ray#L103 in this file I changed "v2r" to "^v2r" and restarted.
It seemed to work, in the log I see this:
Fri Sep 15 14:55:07 2023 daemon.info omr-tracker-v2ray: Reload V2Ray rules
and I have never seen this in the log before, so this if statement has never been run before.
It created a new problem however. When OMR restarted and traffic is switched back to proxy, all of the clients connected to the network could not resolve DNS after this. I had to go to each client of the network and turn their internet connection on and off. I am using OMR DNS because I am using OMR-bypass which was working before this change.
Any ideas on why making this change would kill DNS? I left one client alone but it never recovered on its own so I have to manually restart their networks connection after this change.
I will do another few tests and report the results.
I have tested two more times and the results look good. I am not able to reproduce the problem with DNS. I believe what happened is my temporary fix script that runs every 15 minutes ran during the test and brought the v2ray rules up when v2ray was not available.
I disabled this script and it looks like this change is working. I can confirm that I see this lines in the log which I never saw before making the change:
Fri Sep 15 15:46:23 2023 daemon.info omr-tracker-v2ray: Reload V2Ray rules
Fri Sep 15 15:46:23 2023 user.notice v2ray: Rules UP
Thanks @Ysurac this fix will greatly improve OMR reliability for v2ray proxy users.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days
After VPS is rebooted and connection to VPS is restored OMR should properly restore the connection to use the proxy not the VPN.
Current Behavior
Currently, OMR will switch to VPN (glorytun) and get stuck on glorytun. It does not restore traffic flow to proxy. Only rebooting OMR or validating the wizard (System->Openmptcp->Setitngs wizard) will restore traffic flow to proxy.
Possible Solution
If VPS is restarted this will happen:
OMR knows that the VPS has had a connection issue because it can change the path but when it restores back to VPS it uses the VPN (glorytun) and gets stuck on VPN.
Steps to Reproduce the Problem
@Ysurac can you tell me:
Context (Environment)
This is a major problem because glorytun performance is so bad it is unusable (200-400k). It is so painful to even login to VPS with SSH and try to troubleshoot the problem in this state. It would be much better to stay in direct connection mode until proxy traffic can be restored. Glorytun is so slow that it is very important that OMR can recover itself properly to proxy traffic.
Specifications