Pelagicore / gdbus-codegen-glibmm

Code generator for C++ D-Bus stubs and proxies using Giomm/Glibmm
GNU Lesser General Public License v2.1
23 stars 25 forks source link

Race issue with reading properties on proxy object close after creation #83

Closed erikboto closed 5 years ago

erikboto commented 5 years ago

Currently the <property name>_get() function only handles the case where the property is already in the local cache. The local cache population is started during the creation of the proxy (by calling GetAll asyncronously), but is not guaranteed to be finished before the proxy is complete. Therefore the users of the proxy might call a property's get() method when the value is not in the cache, which will just print a todo message.

This can be problematic since there's no way to tell if a proper value was read or not, at least not for e.g. a bool where the returned value will be false and there's no way of telling that the value is not reflecting the actual property value. You will however see some internal glib messages:

GLib-CRITICAL : 09:11:41.658: g_variant_get_type: assertion 'value != NULL' failed GLib-CRITICAL : 09:11:41.658: g_variant_type_is_subtype_of: assertion 'g_variant_type_check (type)' failed GLib-CRITICAL **: 09:11:41.658: g_variant_get_boolean: assertion 'g_variant_is_of_type (value, G_VARIANT_TYPE_BOOLEAN)' failed

I'm not really sure what the best approach here is:

  1. Pass an extra bool reference to the get function which could be used to indicate if the returned value is valid or not.
  2. Return the actual variant instead of variant.get(), but it would expose more of the internal glib structures to the user
  3. Add an async _get() method, so if the property cannot be read from cache it will be read using the org.freedesktop.DBus.Properties Get(). This might be a nice addition, but doesn't solve the problem with telling if the result from the synchronous _get() method is valid or not.
martin-ejdestig commented 5 years ago

Wonder why Glib does not wait until all properties are in the cache. It is a bit hard to follow the async init code for GDBusProxy but here:

https://gitlab.gnome.org/GNOME/glib/blob/ec17786ad1a620c043b32dba7d9d63128af35fac/gio/gdbusproxy.c#L1730

in the function async_initable_init_second_finish(), process_get_all_reply() is called. For sync initialization, it looks like it is called before returning here:

https://gitlab.gnome.org/GNOME/glib/blob/ec17786ad1a620c043b32dba7d9d63128af35fac/gio/gdbusproxy.c#L1922

For the async case... errr.. hum... hard to see and I have never used GAsyncInitable. But one would guess that the async_initable_init_seconf_finish() is supposed to be part of the async creation phase... ?

erikboto commented 5 years ago

I wonder if Glib tries to do the right thing (wait for reply from GetAll), but something else along the way goes wrong which also leads to the property cache not containing the property?

I also have a hard time trying to understand if it actually waits for the reply for GetAll or not in the async case, but I think it probably does and that my assumption when filing the bug was wrong.

martin-ejdestig commented 5 years ago

Maybe add some log statements to Glib to see what is going on when you detect the race? (If trying to debug it would be hard due to it being timing related.)

Hum, the last commit in gdbusproxy.c currently is "gdbusproxy: Add G_DBUS_DEBUG=proxy support"... there are two g_debug() calls in gdbusproxy.c for logging when GetAll() fails, it looks like.

erikboto commented 5 years ago

I've spent some more time with this, and it turns out the error is completely on my side. I was creating a proxy for org.bluez.Adapter1 on object /org/bluez/hci0, which is always there when running on my HW, but if the app is launched close enough after boot bluez hasn't always created the hci0 device yet. Stupid mistake!

So it was definitely a case of the GetAll failing silently.

martin-ejdestig commented 5 years ago

Nice that you found out why. :)

It also looks like the "name owner changed" case is handled correctly in Glib. "g-name-owner" is notified first after process_get_all_reply() has been called in on_name_owner_changed_get_all_cb() if one uses cached properties in the proxy.

So listening for a name coming and going like I do e.g. here https://github.com/Pelagicore/connectivity-manager/blob/161d2dcecee1464b494d85fce65ce382c7e7f08e/src/daemon/backends/connman_manager.cpp#L45 should work.

That is, you should be able to create a proxy before Bluez has created /org/bluez/hci0 and check if the name exists or not, and also listen for name changes. Not sure if you want to do that if there is a "device created" signal or something in Bluez though... but... just an FYI. (In the ConnMan manager case it makes sense since it is a "persistent" object but the ConnMan daemon can go up and down).

erikboto commented 5 years ago

Well the name org.bluez is there as soon as bluetoothd is started, but the /org/bluez/hci0 object show up with a short delay after start-up. Could be that the hciattach command, needed on the current HW, is not run before that, or it's just a small delay before bluetoothd enumerates the adapters during start up. I now do it "the right way (tm)" which is to use the ObjectManager interface to check existing objects, and connect to InterfacesAdded signal to find out when the /org/bluez/hci0 appears if it wasn't present from the start.

martin-ejdestig commented 5 years ago

You should be able to handle it by using Gio::DBus::Proxy::get_name_owner() (name is empty if no owner) and Gio::DBus::Proxy::property_g_name_owner() (to listen for owner coming and going) as well, I think.

At least that's what I have experienced when implementing CLI tools (create proxy, if get_name_owner() is empty, quit since service not started or could not be auto started) and also handling a service going up and down (as in the ConnMan manager case, create proxy, check if get_name_owner() is empty, connect to property_g_name_owner() to detect object coming and going and check get_name_owner() every time the signal is emitted). A bit ugly perhaps, but I do not know of a better way to handle "service may not be available and may also come and go" case.

Have not used the ObjectManager interface (since I have never communicated with a service that implements it :) , but yeah, probably works as well.

But hum, should anything be done for this issue in the generator? If C++17 could be required then std::optional could be returned (which would "delegate" throwing or not to caller... "I have handled name coming and going, I can just "deref" the std::optional and be sure that it will never throw"... or "I do not handle name coming and going, I better be prepared for std::optional deref to throw or check if it has a value before I do"). But probably a bit too early to require C++17 in the generator. Though, I believe glibmm will have it as a requirement soon (master already does, if I remember correctly).

A bit related, I noticed that the extraction (and copying!) of all the names can probably be removed. A VariantBase whose VariantBase::operator bool() returns false should be returned by get_cached_property() if the property is not in the cache.

mardy commented 5 years ago

I've spent some more time with this, and it turns out the error is completely on my side. I was creating a proxy for org.bluez.Adapter1 on object /org/bluez/hci0, which is always there when running on my HW, but if the app is launched close enough after boot bluez hasn't always created the hci0 device yet. Stupid mistake!

But did the proxy creation fail? IMHO it should, but from what you wrote it seems that you got back some apparently valid proxy, which in fact was invalid.

martin-ejdestig commented 5 years ago

I've spent some more time with this, and it turns out the error is completely on my side. I was creating a proxy for org.bluez.Adapter1 on object /org/bluez/hci0, which is always there when running on my HW, but if the app is launched close enough after boot bluez hasn't always created the hci0 device yet. Stupid mistake!

But did the proxy creation fail? IMHO it should, but from what you wrote it seems that you got back some apparently valid proxy, which in fact was invalid.

I do not think it was "invalid", it is just that there was no remote object to proxy yet (Gio::DBus::Proxy::get_name_owner() empty) and Gio::DBus::Proxy::property_g_name_owner().signal_changed() will be emitted when it becomes available (and Gio::DBus::Proxy::get_name_owner() will no longer be empty).

I think this is well defined in Glib. It is OK to create a proxy on the client side even if there is no remote object. And it will not emit Gio::DBus::Proxy::property_g_name_owner().signal_changed() until after all properties have been cached (if I read the code correctly).

But hum, should anything be done for this issue in the generator? If C++17 could be required then std::optional could be returned

Perhaps we should not do this since it is extra complexity. Instead just return a default constructed value (0 for int, false for bool, empty container for std::vector etc) for properties if there is no owner. And just document it. Perhaps even add some convenience methods for the Gio::DBus::Proxy name owner methods. Not sure this is necessary... but it would hide dbusProxy() more...

A bit related, I noticed that the extraction (and copying!) of all the names can probably be removed.

Opened #84 for this.

erikboto commented 5 years ago

You should be able to handle it by using Gio::DBus::Proxy::get_name_owner() (name is empty if no owner) and Gio::DBus::Proxy::property_g_name_owner() (to listen for owner coming and going) as well, I think.

At least that's what I have experienced when implementing CLI tools (create proxy, if get_name_owner() is empty, quit since service not started or could not be auto started) and also handling a service going up and down (as in the ConnMan manager case, create proxy, check if get_name_owner() is empty, connect to property_g_name_owner() to detect object coming and going and check get_name_owner() every time the signal is emitted). A bit ugly perhaps, but I do not know of a better way to handle "service may not be available and may also come and go" case.

That works if you want to monitor if the actual D-Bus service if coming and going, in this case org.bluez will be there all the time but the object for the bluetooth adapter might not be there. So get_name_owner() returns the same regardless if the object the proxy is created for exist or not.

But did the proxy creation fail? IMHO it should, but from what you wrote it seems that you got back some apparently valid proxy, which in fact was invalid.

As @martin-ejdestig say the proxy is created just fine, even though the object is not exported. But I don't think there's a reliable way of telling when an object becomes exported though.

I think the proposed change in #84, combined with an extra (bool *ok) to solve this case, might be a good outcome of my failing to check if the object exists.

erikboto commented 5 years ago

Maybe the "Todo" message should be changed a bit too, since it's not actually a Todo. What do you think about changing it to a g_warning, using G_LOG_DOMAIN = "gdbus-glibmm-codegen" (a bit long, but I can't think of a nice short name), which reads something like "Couldn't find property in cached properties".

Or we remove the message, since the "bool *ok" will indicate if it succeded anyway.

I also noticed that this cached property lookup can fail even with a valid proxy, since properties can be optional. E.g. org.bluez.Device1 has the property "Name" as optional, which means that even if the object the proxy is for is valid and present, it doesn't mean that all properties specified by the interface XML is present and therefore in the property cache. During discovery I get objects which first don't have this property, and then it appears later when the name lookup is complete. So it's not only when I make bad assumptions that this issue can be seen.

martin-ejdestig commented 5 years ago

That works if you want to monitor if the actual D-Bus service if coming and going, in this case org.bluez will be there all the time but the object for the bluetooth adapter might not be there. So get_name_owner() returns the same regardless if the object the proxy is created for exist or not.

D'oh! Sorry. Yes, the name does not actually change if the service is alive all the time and just registeres/unregisters an object.

But I don't think there's a reliable way of telling when an object becomes exported though.

I think you are right, I do not see a way to detect this in GDBusProxy itself. Has to be done through some other interface exposed by the service.

martin-ejdestig commented 5 years ago

Maybe the "Todo" message should be changed a bit too, since it's not actually a Todo. What do you think about changing it to a g_warning, using G_LOG_DOMAIN = "gdbus-glibmm-codegen" (a bit long, but I can't think of a nice short name), which reads something like "Couldn't find property in cached properties".

Or we remove the message, since the "bool *ok" will indicate if it succeded anyway.

How about g_warning() if no "bool *ok"?

Pointer is passed => caller is prepared to deal with it. Pointer is not passed => g_warning() and return default initialized value (so that behavior is not undefined).

Well, return default initialized value regardless of whether "bool *ok" is passed or not if property is not in cache... just in case caller does not check bool correctly.

erikboto commented 5 years ago

That sounds like a plan, I'll see if I can prepare a PR soon.