Zomboided / service.vpn.manager

VPN plugin for Kodi
GNU General Public License v2.0
305 stars 81 forks source link

Routing table issue #373

Closed lmarceg closed 1 year ago

lmarceg commented 1 year ago

Hi, I have faced this strange issue: I automatically connect on boot (I am using LibreELEC 10.0.2, Kodi Matrix) with NordVPN, I check my IP and I see my AS is not the one from my provider, hence, this is good. But I am using a Rpi4 which sometimes gets disconnected from the WiFi for a few seconds. When this happens, the VPN is established (I can see it in the logs) and your service belives it's all ok, but the routing table is wrong and the DGW is not the one of the VPN, but rather the Internet of my provider.

So, at boot I see this

Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route fe80:: gw :: scope 0 Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route 10.8.3.0 gw 0.0.0.0 scope 253 Sep 13 18:44:34 LibreELEC connmand[439]: wlan0 {add} route 138.199.54.247 gw 192.168.1.1 scope 0 Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route 0.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route 128.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: wlan0 {del} route 138.199.x.x gw 192.168.1.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: tun0 {del} route 0.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: tun0 {del} route 128.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: tun0 {del} route 10.8.3.0 gw 0.0.0.0 scope 253 Sep 13 18:44:49 LibreELEC connmand[439]: (null) {del} route fe80:: gw :: scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route fe80:: gw :: scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route 10.8.1.0 gw 0.0.0.0 scope 253 Sep 13 18:45:00 LibreELEC connmand[439]: wlan0 {add} route 138.199.x.x gw 192.168.1.1 scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route 0.0.0.0 gw 10.8.1.1 scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route 128.0.0.0 gw 10.8.1.1 scope 0

So you see that 139.199 route was added and the correct gateways are defined

Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 10.8.0.1 128.0.0.0 UG 0 0 0 tun0 default 192.168.1.1 0.0.0.0 UG 0 0 0 wlan0 8.8.8.8 192.168.1.1 255.255.255.255 UGH 0 0 0 wlan0 10.8.0.0 255.255.255.0 U 0 0 0 tun0 128.0.0.0 10.8.0.1 128.0.0.0 UG 0 0 0 tun0 138.199.x.x 192.168.1.1 255.255.255.255 UGH 0 0 0 wlan0 192.168.1.0 255.255.255.0 U 0 0 0 wlan0 192.168.1.1 * 255.255.255.255 UH 0 0 0 wlan0

but when the Rpi4 reconnects to the wifi, only some routes are re-inforcec

Sep 13 20:32:57 LibreELEC connmand[439]: tun0 {del} route fe80:: gw :: scope 0 Sep 13 20:32:58 LibreELEC connmand[439]: tun0 {add} route 10.8.1.0 gw 0.0.0.0 scope 253 Sep 13 20:32:58 LibreELEC connmand[439]: tun0 {add} route fe80:: gw :: scope 0 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 192.168.1.0 gw 0.0.0.0 scope 253 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 192.168.1.1 gw 0.0.0.0 scope 253 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 8.8.8.8 gw 192.168.1.1 scope 0 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 0.0.0.0 gw 192.168.1.1 scope 0

The routing table therefore becomes

Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 192.168.1.1 0.0.0.0 UG 0 0 0 wlan0 8.8.8.8 192.168.1.1 255.255.255.255 UGH 0 0 0 wlan0 10.8.3.0 255.255.255.0 U 0 0 0 tun0 192.168.1.0 255.255.255.0 U 0 0 0 wlan0 192.168.1.1 * 255.255.255.255 UH 0 0 0 wlan0

and of course, while the tunnel is still up, no data is using that route as there is no such entry in the routing table: I am therefore using my ISP. If I disconnect and reconnect, I see the following being added:

Sep 14 18:50:44 LibreELEC connmand[439]: tun0 {add} route 10.8.0.0 gw 0.0.0.0 scope 253 Sep 14 18:50:44 LibreELEC connmand[439]: wlan0 {add} route 192.145.x.x gw 192.168.1.1 scope 0 Sep 14 18:50:44 LibreELEC connmand[439]: tun0 {add} route 0.0.0.0 gw 10.8.0.1 scope 0 Sep 14 18:50:44 LibreELEC connmand[439]: tun0 {add} route 128.0.0.0 gw 10.8.0.1 scope 0

And then it works all again till the next wifi disconnection.

Who is managing the routing tables? Your script or something else or is this in a configuration file? I would like to fix this by checking the routing table or the AS number, and if it's not what I need, just issue a disconnect() and reconnect(), something like this. Would it be feasible?

Thanks! Luca

Zomboided commented 1 year ago

I don't do any management of the table, the add-on is 'just' calling openvpn, which I think would update the table based on the directives pushed to it from the provider (guessing here, not 100% sure).

You'd need to determine if during your wifi issue openvpn just deals with the glitch or whether VPN Mgr forces a reconnection. You can see this in the openvpn log, which will get written to the kodi log when a disconnect is detected. If VPN Mgr is restarting the connection you can use an up script to do something, this is documented on the wiki. If VPN Mgr doesn't see the wifi glitch because openvpn is just dealing with it, then you'd need to script something and use the VPN Mgr API, also on the wiki (or just kill the openvpn process and let VPN Mgr notice).

On Wed, Sep 14, 2022 at 7:23 PM lmarceg @.***> wrote:

Hi, I have faced this strange issue: I automatically connect on boot (I am using LibreELEC 10.0.2, Kodi Matrix) with NordVPN, I check my IP and I see my AS is not the one from my provider, hence, this is good. But I am using a Rpi4 which sometimes gets disconnected from the WiFi for a few seconds. When this happens, the VPN is established (I can see it in the logs) and your service belives it's all ok, but the routing table is wrong and the DGW is not the one of the VPN, but rather the Internet of my provider.

So, at boot I see this

Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route fe80:: gw :: scope 0 Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route 10.8.3.0 gw 0.0.0.0 scope 253 Sep 13 18:44:34 LibreELEC connmand[439]: wlan0 {add} route 138.199.54.247 gw 192.168.1.1 scope 0

Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route 0.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:34 LibreELEC connmand[439]: tun0 {add} route 128.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: wlan0 {del} route 138.199.x.x gw 192.168.1.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: tun0 {del} route 0.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: tun0 {del} route 128.0.0.0 gw 10.8.3.1 scope 0 Sep 13 18:44:49 LibreELEC connmand[439]: tun0 {del} route 10.8.3.0 gw 0.0.0.0 scope 253 Sep 13 18:44:49 LibreELEC connmand[439]: (null) {del} route fe80:: gw :: scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route fe80:: gw :: scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route 10.8.1.0 gw 0.0.0.0 scope 253 Sep 13 18:45:00 LibreELEC connmand[439]: wlan0 {add} route 138.199.x.x gw 192.168.1.1 scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route 0.0.0.0 gw 10.8.1.1 scope 0 Sep 13 18:45:00 LibreELEC connmand[439]: tun0 {add} route 128.0.0.0 gw 10.8.1.1 scope 0 So you see that 139.199 route was added and the correct gateways are defined Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 10.8.0.1 128.0.0.0 UG 0 0 0 tun0 default 192.168.1.1 0.0.0.0 UG 0 0 0 wlan0 8.8.8.8 192.168.1.1 255.255.255.255 UGH 0 0 0 wlan0 10.8.0.0 * 255.255.255.0 U 0 0 0 tun0 128.0.0.0 10.8.0.1 128.0.0.0 UG 0 0 0 tun0 138.199.x.x 192.168.1.1 255.255.255.255 UGH 0 0 0 wlan0 192.168.1.0 * 255.255.255.0 U 0 0 0 wlan0 192.168.1.1 * 255.255.255.255 UH 0 0 0 wlan0 but when the Rpi4 reconnects to the wifi, only some routes are re-inforcec Sep 13 20:32:57 LibreELEC connmand[439]: tun0 {del} route fe80:: gw :: scope 0 Sep 13 20:32:58 LibreELEC connmand[439]: tun0 {add} route 10.8.1.0 gw 0.0.0.0 scope 253 Sep 13 20:32:58 LibreELEC connmand[439]: tun0 {add} route fe80:: gw :: scope 0 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 192.168.1.0 gw 0.0.0.0 scope 253 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 192.168.1.1 gw 0.0.0.0 scope 253 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 8.8.8.8 gw 192.168.1.1 scope 0 Sep 13 20:33:03 LibreELEC connmand[439]: wlan0 {add} route 0.0.0.0 gw 192.168.1.1 scope 0 The routing table therefore becomes Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 192.168.1.1 0.0.0.0 UG 0 0 0 wlan0 8.8.8.8 192.168.1.1 255.255.255.255 UGH 0 0 0 wlan0 10.8.3.0 * 255.255.255.0 U 0 0 0 tun0 192.168.1.0 * 255.255.255.0 U 0 0 0 wlan0 192.168.1.1 * 255.255.255.255 UH 0 0 0 wlan0 and of course, while the tunnel is still up, no data is using that route as there is no such entry in the routing table: I am therefore using my ISP. If I disconnect and reconnect, I see the following being added: Sep 14 18:50:44 LibreELEC connmand[439]: tun0 {add} route 10.8.0.0 gw 0.0.0.0 scope 253 Sep 14 18:50:44 LibreELEC connmand[439]: wlan0 {add} route 192.145.x.x gw 192.168.1.1 scope 0 Sep 14 18:50:44 LibreELEC connmand[439]: tun0 {add} route 0.0.0.0 gw 10.8.0.1 scope 0 Sep 14 18:50:44 LibreELEC connmand[439]: tun0 {add} route 128.0.0.0 gw 10.8.0.1 scope 0 And then it works all again till the next wifi disconnection. Who is managing the routing tables? Your script or something else or is this in a configuration file? I would like to fix this by checking the routing table or the AS number, and if it's not what I need, just issue a disconnect() and reconnect(), something like this. Would it be feasible? Thanks! Luca — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
lmarceg commented 1 year ago

Understood. VPN Mgr is not restaring during the glitch, so the service thinks it's all OK. My idea is to use the GetIPInfo() to check against the ISP Name or xbmcgui.Window(10000).getProperty("VPN_Manager_API_State") to get, again, the ISP Name. If it's my ISP, then the VPN is not working fine and I should restart or reconnect (don't know what would be best). But I couldn't locate the piece of code where you constantly check for the VPN Status. I understood you use GetVPNState() and IsVPNConnected() but I don't really understand where I should add my code and how to restart the connection (which py and which line(s))

Another alternative is to run a bash script in the background that checks the routing table every 30s and if this is not good, I can kill openvpn. I believe this would trigger your service because one of the two functions I have mentioned before should return an error. But this means adding a script on top of some checks you already do, so I'd rather have the first approach.

If you can kindly hint me where I should look at, then it shouldn't be too complicated to add that part of code. Thanks!

Zomboided commented 1 year ago

You shouldn't try to hack your use case into the code (for multiple reasons). The right way is to use the API https://github.com/Zomboided/service.vpn.manager/wiki/11.-API where you can check the state (and IP?) from a script and then drive a reconnect. Alternatively you could simply start a cron job or equivalent and if you see openvpn is running, check the IP is not your ISP. if it is, just kill openvpn and VPN Mgr will reconnect in under a minute. There are options that will surpress any reconnect during playback, but you can set those to suit the behaviour your want using the VPN Mgr settings.

On Fri, Sep 16, 2022 at 3:05 PM lmarceg @.***> wrote:

Understood. VPN Mgr is not restaring during the glitch, so the service thinks it's all OK. My idea is to use the GetIPInfo() to check against the ISP Name or xbmcgui.Window(10000).getProperty("VPN_Manager_API_State") to get, again, the ISP Name. If it's my ISP, then the VPN is not working fine and I should restart or reconnect (don't know what would be best). But I couldn't locate the piece of code where you constantly check for the VPN Status. I understood you use GetVPNState() and IsVPNConnected() but I don't really understand where I should add my code and how to restart the connection (which py and which line(s))

Another alternative is to run a bash script in the background that checks the routing table every 30s and if this is not good, I can kill openvpn. I believe this would trigger your service because one of the two functions I have mentioned before should return an error. But this means adding a script on top of some checks you already do, so I'd rather have the first approach.

If you can kindly hint me where I should look at, then it shouldn't be too complicated to add that part of code. Thanks!

— Reply to this email directly, view it on GitHub https://github.com/Zomboided/service.vpn.manager/issues/373#issuecomment-1249407050, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECJZZOUVTBYRCLH2S7MGQLV6R5BRANCNFSM6AAAAAAQMVPEEI . You are receiving this because you commented.Message ID: @.***>

lmarceg commented 1 year ago

Agreed. I have a script that checks the status of the WiFi connectivity every 30 seconds, and if it sees it cannot connect to my NAS, it will bring the interfaces down and up. This is because I can have even 30 minutes where the network is basically not reachable, but the WiFi is still seen as on. I think it's an Rpi4 issue (tons of people reporting it, and tons of people writing such scripts) , but cabling it with eth is too messy for me. Now, I see that in my script I tear down wlan0 and also tun, but when I bring tun up, the routing table is broken. So I have removed the tun removal, I've done some tests (manually) and the VPN seems to stay on, together with the routing table.

On top, I also check if the routing table has the DG Mask set to 128/8, and if not, I just kill the openvpn. As you write, your service very quickly sees that and re-restablishes the connection. My VPN is auto-connected at boot, so it should be always on.

I also see that VPN Mgr sometimes issues a killall and openvpn doesn't seem to get closed, so I changed the kill -15 into kill -9, maybe this will help.

I will test it and report back. Thanks

Zomboided commented 1 year ago

I was gonna suggest fixing wifi, but that was too obnoxious :D There's an option to always us kill -9 as an alternative to -15, and if openvpn doesn't get closed as expected, it escalates anyway.

On Fri, Sep 16, 2022 at 6:35 PM lmarceg @.***> wrote:

Agreed. I have a script that checks the status of the WiFi connectivity every 30 seconds, and if it sees it cannot connect to my NAS, it will bring the interfaces down and up. This is because I can have even 30 minutes where the network is basically not reachable, but the WiFi is still seen as on. I think it's an Rpi4 issue (tons of people reporting it, and tons of people writing such scripts) , but cabling it with eth is too messy for me. Now, I see that in my script I tear down wlan0 and also tun, but when I bring tun up, the routing table is broken. So I have removed the tun removal, I've done some tests (manually) and the VPN seems to stay on, together with the routing table.

On top, I also check if the routing table has the DG Mask set to 128/8, and if not, I just kill the openvpn. As you write, your service very quickly sees that and re-restablishes the connection. My VPN is auto-connected at boot, so it should be always on.

I also see that VPN Mgr sometimes issues a killall and openvpn doesn't seem to get closed, so I changed the kill -15 into kill -9, maybe this will help.

I will test it and report back. Thanks

— Reply to this email directly, view it on GitHub https://github.com/Zomboided/service.vpn.manager/issues/373#issuecomment-1249617421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECJZZJXCNXMMIJCKQ2E3TDV6SVVNANCNFSM6AAAAAAQMVPEEI . You are receiving this because you commented.Message ID: @.***>

lmarceg commented 1 year ago

Yes, I used that option in the GUI

lmarceg commented 1 year ago

Hi, I created a script that checks the routing table and if I cannot find the correct tunnel, I'll tear openvpn down. Everything is now working perfectly, so we can close this case. Thanks!