OpenVPN / ovpn-dco-win

OpenVPN Data Channel Offload driver for Windows
MIT License
47 stars 22 forks source link

Connection lost after resuming from hibernate #64

Closed ElStupid closed 3 months ago

ElStupid commented 6 months ago

To be fair, I'm a fairly new user of OpenVPN, using it as my default connection. The OpenVPN-Gui was supplied by Surfshark, but their version did not auto connect on startup of my system. So I found v11.46.0.0, which has an option to Enable auto restart of active connections.

However, after resuming my machine from hibernate, inertnet connection is lost and doesn't auto reconnect. The tray icon seems happily green, but I have to reconnect myself in order t have internet connection again, which seems like unintended behaviour. Is this a known issue, which will be fixed in an upcoming release?

Thnx!

ElStupid commented 3 months ago

Issue still present on GUI 11.48.0.0 / OpenVPN 2.6.10

selvanair commented 3 months ago

I haven't seen any such reports, so will need more information. The client could take some time to notice that the connection is down and initiate reconnection. If it stays green for ever with no traffic flowing either there is a setup issue (no keep alive pings?) or something else is wrong. Is this using tap-windows, wintun or DCO?

The GUI by itself does nothing special on sleep or hybernation. It depends on the the driver to reset in which case openvpn would reconnect or for a ping restart to trigger.

Post logs spanning times just before and after the hybernation.

Juliedson commented 3 months ago

I'm having some trouble using OpenVPN GUI, it was working fine but since last week its not working most of times i connect it in Windows 11.

Idk whats the problem, tried the same VPN connection with openVPN default connector and it is still working ok, but with openVPN GUI and the default DCO adapter the connection is not working most of times im connecting on it, its weird because some websites doesnt work but some applications do(e.g telegram) and its not a DNS problem cuz i just changed it, would it be DCO!? Is there a way to deactivate the DCO using the GUI?

Someone know how to fix it? I need to use it to run a recursive reconnection

ElStupid commented 3 months ago

Is this using tap-windows, wintun or DCO?

Thanks @selvanair for responding. Don't know, how can I find out? I did a regular installation

Post logs spanning times just before and after the hybernation.

See attached log. I can't find anything unusual. Yesterday evening hibernated my PC , this morning booted up. Nothing happens, connection stays green but lost and no retries in log. After I manually reconnect, the log is updated again. So log entries from 10:20 are after manually connecting.

For my privacy I replaced my external IP address with x.x.x.x

github_openvpn_udp.log

selvanair commented 3 months ago

@ElStupid Thanks for the logs. The tunnel is using DCO.

The part of the log around hybernation reads:

2024-04-04 19:43:38 Initialization Sequence Completed
2024-04-04 19:43:38 MANAGEMENT: >STATE:1712252618,CONNECTED,SUCCESS,10.8.8.4,x.x.x.x,1194,,
2024-04-04 19:43:38 Data Channel: cipher 'AES-256-GCM', peer-id: 2
2024-04-04 19:43:38 Timers: ping 60, ping-restart 180
2024-04-04 19:43:38 Protocol options: explicit-exit-notify 1
2024-04-05 10:20:15 MANAGEMENT: CMD 'signal SIGHUP'

I would expect an automatic ping-restart after rebooting from hybernation as internal pings will fail after such a long period of inactivity. But there is nothing in the logs.

@cron2 @lstipakov : any thoughts what could be happening here?

Apart from this manual reconnection, there are several sequence of events in the log with an unusual SIGHUP:

E.g., see the snippet below showing an initial connection at 15:23, a renegotiation at 16:18 but then at 19:43 it kills the expiring key and gets a SIGHUP from the management (GUI) right afterwards.

Did you have to do several manual reconnections during the period of the log which spans a day or two? If not, its unclear to me what is causing those SIGHUPs. My feeling is that lack of automatic recovery after hybernation is probably related to something wrong with the keep-alive mechanism and/or renegotiations in this setup.

relevant log snippet:

2024-04-04 15:23:21 Initialization Sequence Completed
2024-04-04 15:23:21 MANAGEMENT: >STATE:1712237001,CONNECTED,SUCCESS,10.8.8.4,x.x.x.x,1194,,
2024-04-04 15:23:21 Data Channel: cipher 'AES-256-GCM', peer-id: 2
2024-04-04 15:23:21 Timers: ping 60, ping-restart 180
2024-04-04 15:23:21 Protocol options: explicit-exit-notify 1
2024-04-04 16:18:31 VERIFY OK: depth=2, C=VG, O=Surfshark, CN=Surfshark Root CA
2024-04-04 16:18:31 VERIFY OK: depth=1, C=VG, O=Surfshark, CN=Surfshark Intermediate CA
2024-04-04 16:18:31 VERIFY KU OK
2024-04-04 16:18:31 Validating certificate extended key usage
2024-04-04 16:18:31 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
2024-04-04 16:18:31 VERIFY EKU OK
2024-04-04 16:18:31 VERIFY OK: depth=0, CN=nl-ams-dipv012.prod.surfshark.com
2024-04-04 16:18:31 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bits RSA, signature: RSA-SHA256, peer temporary key: 253 bits X25519
2024-04-04 19:43:09 TLS: tls_process: killed expiring key
2024-04-04 19:43:09 dco_del_key: peer-id 2, slot 1 called but ignored
2024-04-04 19:43:35 MANAGEMENT: CMD 'signal SIGHUP'

same at 13:18 and 15:23 on the same day.

selvanair commented 3 months ago

I'm having some trouble using OpenVPN GUI, it was working fine but since last week its not working most of times i connect it in Windows 11.

Idk whats the problem, tried the same VPN connection with openVPN default connector and it is still working ok, but with openVPN GUI and the default DCO adapter the connection is not working most of times im connecting on it, its weird because some websites doesnt work but some applications do(e.g telegram) and its not a DNS problem cuz i just changed it, would it be DCO!? Is there a way to deactivate the DCO using the GUI?

Someone know how to fix it? I need to use it to run a recursive reconnection

@Juliedson Do not post unrelated questions to an ongoing discussion -- this one is about recovering from hybernation.

Its unclear what you are trying to report or what "OpenVPN default connector" is. You may want to contact your server administrator. Otherwise file a clear bug report with logs as a separate issue.

ElStupid commented 3 months ago

Did you have to do several manual reconnections during the period of the log which spans a day or two? If not, its unclear to me what is causing those SIGHUPs. My feeling is that lack of automatic recovery after hybernation is probably related to something wrong with the keep-alive mechanism and/or renegotiations in this setup.

same at 13:18 and 15:23 on the same day.

@selvanair Thx for your time. Yes, this log spans multiple sessions of using my pc, so multiple hibernations. After resuming I need to reconnect to be able to use my internet connection again (or disconnect from vpn)

selvanair commented 3 months ago

There are two kinds of SIGHUP instances in your log: (i) with nothing in the logs just prior to that and (ii) with the "killed expiring key" message logged just few seconds to a minute before SIGHUP.

It could make some sense if only (ii) are instances where resume from hybernate happens, DCO kills expired key and you do a manual restart right after. The reason for (i) is unclear to me.

In any case, try giving it at least 3 minutes (your ping-restart time = 180sec) after resume before you press reconnect. If that works, we could consider how the reconnection attempt could be speeded up on resume.

selvanair commented 3 months ago

I can kind of reproduce this with dco -- it does recover but takes a while which, I think, it should not.

When using tap-windows6, ping-restart triggers right after resume and the connection is back in a few seconds. With dco, somehow restart doesn't trigger until a whole ping-restart interval has passed after the resume epoch.

ElStupid commented 3 months ago

github_openvpn_udp2.log You're totally right. This morning I waited 3 minutes and after that I got reconnected. My configuration is like this, I think 180 seconds is default, so should I specify like 5 or 10??

_client dev tun proto udp remote x.x.x.x 1194

remote-random

nobind tun-mtu 1500 mssfix 1450 ping 15 ping-restart 0 reneg-sec 0_

See logging, I started my PC @8:45

selvanair commented 3 months ago

It should not take a ping-restart interval for the reconnect to trigger after long hybernation. In this case the ping timeout has happened a long time ago and restart should be almost immediate on resume. I suspect something is wrong the way ping timeout is handled when dco is in use. Wait for @lstipakov to chime in.

so should I specify like 5 or 10

A quick fix is to fall back to tap-windows using windows-driver tap-windows6 although dco is preferred for performance.

Alternative of using a shorter ping-restart will gain little: as the server is pushing ping 60, it may be using the same in its config. You need a ping-restart value of at least twice that much (two pings), so 120 may be the lowest one can go in this case. Otherwise there will be spurious restarts. Note that the tunnel is considered lost when no pings are received from the server.

You'll have to use pull-filter ignore ping-restart and ping-restart 120 to change it.

lstipakov commented 3 months ago

Thanks for the report. I haven't seen it myself, but let me retest.

selvanair commented 3 months ago

Tested again: it takes a variable amount of time that is less than the ping-restart interval for restart to trigger. As if time during hybernation is not counted towards the timeout. In the DCO driver the receive timer timeout is set as a relative interval which may not react to system time getting reset on resume. We need to use an absolute time value instead?

lstipakov commented 3 months ago

I can confirm that. With hibernate (S4) it is easy to reproduce, and with "modern standby" (S0 low power), which is default on my system, it depends - usually when my system wakes up from "modern standby" it immediately got "keepalive timeout" notification in userspace, even though according to kernel logs it was fired some time ago.

During hibernate and, to some extent, modern standby, "relative" timers (the ones I use) are not ticking, so when system wakes up timers are continued where they are left off, which makes keepalive timeout experience sub-optimal.

I'll refactor timers implementation and use instead single timer with 1sec resolution, where I compare last and now values obtained with KeQuerySystemTime call. This is similar to what OpenVPN2 userspace is doing.

lstipakov commented 3 months ago

Here is a new version of the driver with refactored timers implementation. I have tested both hibernate and "modern standby" - in both cases the client reconnected as soon as machine waked up. @selvanair @ElStupid if you could test this as well it would be great. Just pick the driver depends on your OS and run (I also added devcon.exe to the archive):

PS C:\Users\lev\Projects\ovpn-dco-win\signed\1.1.0\win11> .\devcon.exe install .\ovpn-dco.inf ovpn-dco
Device node created. Install is complete when drivers are installed...
Updating drivers for ovpn-dco from C:\Users\lev\Projects\ovpn-dco-win\signed\1.1.0\win11\ovpn-dco.inf.
Drivers installed successfully.

The driver is attestation signed so you should be able to install it without any additional steps. I have bumped the version so it should update the driver for existing devices - you should be seeing it from openvpn log:

Thu Apr 11 23:12:59 2024 OpenVPN 2.6.10 [git:v2.6.10/ba0f62fb950c56a0] Windows [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [AEAD] [DCO] built on Mar 20 2024
Thu Apr 11 23:12:59 2024 Windows version 10.0 (Windows 10 or greater), amd64 executable
Thu Apr 11 23:12:59 2024 library versions: OpenSSL 3.2.1 30 Jan 2024, LZO 2.10
Thu Apr 11 23:12:59 2024 DCO version: 1.1.0

ovpn-dco-win-1.1.0-win10.zip ovpn-dco-win-1.1.0-win11.zip

selvanair commented 3 months ago

This works as expected. On resume from suspend to disk (aka hybernate / S4), ping-restart triggered almost right away.

Interestingly, I thought timerConfig.TolerableDelay = TolerableDelayUnlimited; in the patch might break the tunnel in "Modern standby with network" mode where the tunnel used to stay up earlier. But it does continue to work. Hope that's not because my machine was only lightly loaded.

Note: Just installing the driver using devcon did not work for me as OpenVPN.exe kept reporting DCO version as 1.0.1 and the ping-restart behaviour remained as before. Although the driver tab of the adapter showed 1.1.0. Deleting all existing dco adapters and recreating using tapctl.exe fixed that.

ElStupid commented 3 months ago

Here is a new version of the driver with refactored timers implementation. I have tested both hibernate and "modern standby" - in both cases the client reconnected as soon as machine waked up. @selvanair @ElStupid if you could test this as well it would be great. Just pick the driver depends on your OS and run (I also added devcon.exe to the archive):

Thanks! Will try later today and keep you posted.

ElStupid commented 3 months ago

Note: Just installing the driver using devcon did not work for me as OpenVPN.exe kept reporting DCO version as 1.0.1 and the ping-restart behaviour remained as before. Although the driver tab of the adapter showed 1.1.0. Deleting all existing dco adapters and recreating using tapctl.exe fixed that.

A little beyond my technical knowledge, but we'll see ;)

lstipakov commented 3 months ago

@ElStupid The easiest way would probably be open Device Manager, expand Network Adapters, find OpenVPN Data Channel Offload, right click -> Remove Device, and in an popped up dialog box select checkbox Remove Drivers. After that you can:

PS C:\Users\lev\Projects\ovpn-dco-win\signed\1.1.0\win11> .\devcon.exe install .\ovpn-dco.inf ovpn-dco
Device node created. Install is complete when drivers are installed...
Updating drivers for ovpn-dco from C:\Users\lev\Projects\ovpn-dco-win\signed\1.1.0\win11\ovpn-dco.inf.
Drivers installed successfully.
lstipakov commented 3 months ago

Just found (and fixed) a bug in 1.1.0. I'll attach the client installer here soonish, which includes the new version of the driver. (1.1.1)

ElStupid commented 3 months ago

Just tried this and works flawlessly, thanks so much. After resuming from hibernate I'm instantly connected now.

selvanair commented 3 months ago

The change from 1.1.0 to 1.1.1 would only affect TCP instances right? I do not have one at hand to test against, but looks good to me.

lstipakov commented 3 months ago

Correct, TCP only. You can get the new installer for different platforms here https://github.com/OpenVPN/openvpn-build/actions/runs/8685747402

lstipakov commented 3 months ago

Fixed in 2.6.10-I002.