OpenVPN signals to systemd that it is ready before the VPN connection is up

me-and commented 1 month ago

Describe the bug OpenVPN sends the READY=1 sd_notify message to systemd once it has completed initialisation, before the VPN tunnel is established. This means that systemd units that depend on the VPN connection being up by ordering themselves after the OpenVPN unit will fail.

To Reproduce Broad outline:

Create a systemd unit for an OpenVPN connection
Create a systemd unit that can only succeed once the OpenVPN connection is up, and order it after the OpenVPN unit with Requires= and After=
Start the second unit. Note that it fails.

My specific scenario has CIFS mount and automount units; the mount will only succeed if the VPN has connected. The trimmed-for-simplicity unit config looks like this:

/etc/systemd/system/openvpn-pdnet.service:

[Unit]
After=network.target
Description=OpenVPN instance ‘pdnet’

[Service]
ExecStart=openvpn --suppress-timestamps --config <file>
Restart=always
Type=notify

/etc/fstab:

//example.org/Directory /mnt cifs rw,credentials=/etc/creds,x-systemd.automount,x-systemd.mount-timeout=60s,nofail,x-systemd.requires=openvpn-pdnet.service,x-systemd.after=openvpn-pdnet.service 0 0

Expected behavior OpenVPN does not signal to systemd that it is ready until the VPN connection is actually up, so that systemd units configured to only start after the OpenVPN unit can safely rely on the connection being available.

Version information:

OS: NixOS 24.05
OpenVPN version: 2.6.12

Additional context The current behaviour is deliberate, and was introduced by e83a8684f0a0d944e9d53cdad2b543cfd1b6fbae in 2017; before that commit the behaviour I'd like was in place. Given this is deliberate behaviour, I'm opening this report to discuss whether we want a fix; the fix itself is pretty straightforward assuming there's consensus that this is actually a bug.

There are three justifications given in the commit message for that change:

"First, it adds challenges if --chroot is used in the configuration; this is already fixed."

I don't use --chroot so I'm not quite sure what the challenges are, but if they've already been fixed then they presumably don't need this fix as well?
"Secondly, it will cause havoc on static key p2p mode configurations where the log line above ["Initialization Sequence Completed"] will not happen before either sides have completed establishing a connection"

I also don't use these configurations, and I don't have a sense of what havoc this causes. But it seems correct to me that the OpenVPN client would not report itself as ready to systemd before the connection has been established.
"And thirdly, if a client configuration fails to establish a connection within 90 seconds, it will also fail. For the third case this may not be a critical issue itself, as the host just needs to get an Internet access established first - which in some scenarios may take much longer than those 90 seconds systemd grants after the OpenVPN client configuration is started."

I do understand this issue, but I think this is the wrong solution. Or at least it's the wrong solution now; it may well be that systemd didn't have better options when this change was made in 2017. The point of reporting that a unit is ready is that systemd knows the unit can be used and other units that depend on it can run. If it's taking a long time for OpenVPN to fully establish a connection, it's correct that systemd should know and be able to act on that situation.

If this is causing problems, the correct solutions IMVHO are either (a) setting a longer timeout before systemd declares the unit has failed, using TimeoutStartSec=, or (b) where there is some other blocking requirement like Internet access, to order the OpenVPN unit after that access has been established, e.g. by specifying Wants=network-online.target and After=network-online.target.

As I say, I definitely don't understand the ramifications of at least two of the problems outlined above, and I could well believe I don't fully appreciate the third either. Nonetheless, being able to order one systemd unit after another is exactly what the READY=1 notification is intended for, and this functionality is broken with the current OpenVPN code. At the very least, I'd like the behaviour to be configurable.

I have a patch more-or-less ready to go for what I consider to be the preferable behaviour, and I'm currently running OpenVPN with that patch in place.

me-and commented 1 month ago

The patch I'm currently using is at https://github.com/me-and/nixcfg/blob/d53a799b29de3a80889db1c13b391fef4d422113/overlays/openvpn/openvpn.diff, although I expect it'll want a small amount of tidying for style.

cron2 commented 1 month ago

What does the patch do on openvpn server instances (where "waiting for an established VPN session" might never happen)?

me-and commented 1 month ago

Ah! Good question; I hadn't realised this code was used for server initialisation as well as client initialisation.

In the server scenario, I'd expect OpenVPN to emit the "ready" signal as soon as it can accept incoming VPN connections, which probably means it'll need to emit the signal in different places depending on whether it's acting as a client or server.

I also now suddenly have a much clearer idea of what the p2p use case is. I imagine that mode is the clearest use case for making this configurable, as I can absolutely see people wanting to rely on both OpenVPN being available for connections and for OpenVPN to be fully connected.

cron2 commented 1 month ago

Introducing a new config option is the traditional way we solve things, and now we have hundreds of options that nobody can remember why they were introduced :-) - so this needs good consideration.

Your use case ("client, outbound connection, must wait") and the server case ("signal systemd as soon as incoming connections can be accepted") could be distinguished by looking at the --client option. But... other users might be happy with OpenVPN just starting and succeeding "eventually", not having something wait for it... would that interfere with anything?

p2p mode comes in two kinds - one is the old static key mode, which basically does not negotiate anything (so, "READY" is reached "when everything is ready"). Then, there's TLS p2p mode, which does negotiate, and has a tls-client and tls-server role... so these could again be differentiated by config.

OTOH, maybe we do want an option... pinging @flichtenheld to get more brains.

flichtenheld commented 1 month ago

Note that is is very helpful to also go to the Trac tickets referenced in the commit (https://community.openvpn.net/openvpn/ticket/827, https://community.openvpn.net/openvpn/ticket/801). They show much more about the motivation behind the patch and @dsommers there acknowledges the problems that are raised in this issue. So as you said, this is definitely expected behavior and not an accidental regression. So just reverting this patch will probably be not the correct solution.

Ultimately I guess there are too many use-cases to satisfy everyone by the default behavior. A --client with stable internet but complex dependencies might want different behavior than a --client on the road with intermittent connections. And a --server might want to have different behavior again.

So far I saw two suggestions to make the behavior more flexible:

As suggested above, add a config option (default might depend on --client/--server/etc.). Relatively straightforward, one more option...
In 827 it was suggested that a possible solution might be an additional systemd service that could monitor openvpn and be used to report an additional status. That would probably need to talk to openvpn on the management interface to get a accurate status. This might be relatively complex.

me-and commented 1 month ago

@flichtenheld Thank you! I hadn't known how/where to find the Trac tickets, but that makes a lot of sense.

Having an additional service that could provide additional state information is the approach taken by several other parts of systemd, for example network.target vs network-online.target, or time-set.target vs time-sync.target. But, as you say, that seems like it'll add a lot of complexity, and would only be more useful than the config option approach if someone needed to use the same configuration but have different operations ordered after both the initial setup being completed and after the VPN was fully established.

My inclination would be config options. Something like sd-notify on-init vs sd-notify on-connect, with on-connect being the default behaviour for clients, and on-init being the current behaviour and the default otherwise.

OpenVPN / openvpn

OpenVPN signals to systemd that it is ready before the VPN connection is up #617