OpenVPN / openvpn3-linux

OpenVPN 3 Linux client
GNU Affero General Public License v3.0
558 stars 150 forks source link

dbus service unstable at bootup. #100

Closed jkotra closed 2 years ago

jkotra commented 2 years ago

running for the first time after boot. dbus service fails to even get openpvn3 version.

jkotra@ubuntu:~$ ./test
(process:16445): GLib-CRITICAL **: 00:01:41.861: g_variant_get: assertion 'value != NULL' failed
Segmentation fault (core dumped)

(retry after few seconds) was able to get version but cannot connect to /net/openvpn/v3/sessions

jkotra@ubuntu:~$ ./test
v17_beta
/net/openvpn/v3/configuration/1136f59fx646bx42dbxba20x5dc6cffcaa14
config imported

** (process:16459): ERROR **: 00:01:47.389: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface ?net.openvpn.v3.sessions? on object at path /net/openvpn/v3/sessions
Trace/breakpoint trap (core dumped)

(retry after few seconds) after a few seconds, it's all good now.

jkotra@ubuntu:~$ ./test
v17_beta
/net/openvpn/v3/configuration/bec465a7xd015x4017xb9d2x8fc6262b4482
config imported
/net/openvpn/v3/sessions/d4ccb8b4s615es40a5s905cs1406f4a1da30
session created
auth_sent!
connected
** Message: 00:04:40.731: 2 7 

** Message: 00:04:42.750: /net/openvpn/v3/sessions/d4ccb8b4s615es40a5s905cs1406f4a1da30 disconnected!

I've first noticed this behavior on my primary development setup running Arch Linux. I tested it again on Ubuntu 20.04 (which is officially supported according to docs) and it's same thing there too.


I will try to post dbus-monitor logs for these failing events/calls. Please let me know if you need any more info.

jkotra commented 2 years ago

system bus logs captured with dbus-monitor

ver.log ses.log

dsommers commented 2 years ago

Can you share your ./test code? D-Bus doesn't guarantee you to have a service available on the first initial call. So there are a few tweaks needed to verify the service being available and retry if it isn't, if the service takes a bit longer than a couple of ms to start and initialize.

If your ./test code is a Python script and you use the openvpn3 module provided by this project, then it might be a bug here. If your test code instead is C++ code pulling in the proxy code from this project, it might be some oversights there. If ./test is a simple shell calling openvpn3, it might be a problem there ... and so-forth. It is hard to see why things are going wrong in your case without understanding what ./test does.

jkotra commented 2 years ago

Hi, here's my code. https://gist.github.com/jkotra/403984d306012472e7eda630306a47de

please note it's the bare minimum to get it up and running. ignore unnecessary linking and bad code writing in general. it's just a toy example I've been playing with. sorry :smiling_face_with_tear:

dsommers commented 2 years ago

No worries, this shows exactly the same pitfalls we've had in openvpn3-linux with the glib2 D-Bus implementation (it s**ks, and we're working on replacing it completely in a later release). glib2 is notoriously annoying at these areas.

I've added a hack to the generic proxy code which is used most places where our D-Bus "clients" connects to one our our OpenVPN 3 D-Bus services .... It simply calls a function retrieving the version number of the service, simply because when that property is readable, the service is readable. And if it fails, it sleeps for a bit and then retries a number of times before giving up. Normally, it's enough with a single waiting round. https://github.com/OpenVPN/openvpn3-linux/blob/master/src/dbus/proxy.hpp#L287

The proper way is probably to query the dbus-daemon directly and wait for some NameOwnerChanged signals or similar before continuing. But in our C++ wrapping of the C based glib2 library, that got quickly just too messy in the current code - as it would need additional worker threads and signaling between threads when the signal appeared. So we went for a simpler hack this round. For the OpenVPN 3 Linux code, we will hopefully be able to kick out most of these hacks with the new D-Bus implementation; but that doesn't change the situation for other projects building on glib2 gdbus APIs.

If it is a viable option for you, consider to use the sd-bus implementation instead of glib2. That was the recommendation "everyone" told me when I started this project. But at that time, it was not as easily available as it seems today; it depends on the Linux distro(s) you want to support.

jkotra commented 2 years ago

thanks for the insights :+1: . I will make use of delay/retry hacks in my code until this gets better.

dsommers commented 2 years ago

Unfortunately, as long as you use glib2 in your code, it is glib2 which needs to be "fixed" - or D-Bus daemon which needs to report availability a bit later (when the service is really available, not just started).

The glib2 implementation we're swapping out internally will mostly just improve the openvpn3 and openvpn3-admin utilities, as well as the IPC between the openvpn3-service-* processes.

Which is why I recommend the sd-bus implementation for you instead of glib2's gdbus, if that is an option for you.

dsommers commented 2 years ago

I'm closing this now, as there is not much we can do in OpenVPN 3 Linux to resolve issues with external dbus clients.