Pelagicore / gdbus-codegen-glibmm

Code generator for C++ D-Bus stubs and proxies using Giomm/Glibmm
GNU Lesser General Public License v2.1
23 stars 25 forks source link

Address issue with byte strings that are not null terminated #82

Closed erikboto closed 4 years ago

erikboto commented 5 years ago

This fixes a bug where properties with signature "ay" had to be null terminated, which is not always the case.

The issue is that Glib::Variant::get() uses g_variant_get_bytestring() which will return an empty string if the array does not end with null. It will also only return a part of the array if the array contains a null character before the end of the array.

Special handling of signature "ay" is added, so it will use g_variant_get_fixed_array() instead which works well with any kind of array payload.

erikboto commented 5 years ago

@mardy @martin-ejdestig et al: It would be nice to get some early feedback on this.

  1. I put the "specialGetter" stuff in both stub/proxy templates, which means duplication, but I'm unsure of how this could be shared in a nice way?
  2. "specialGetter" is not a nice name, I'm all open for suggestions.
  3. What about the solution with checks for prop.signature == "ay"? Can you think of a better way?
erikboto commented 5 years ago

I see that I need to update the commit message about having to avoid Glib::Variant<std::string>::create(std::string) since that will automagically make it null terminated.

martin-ejdestig commented 5 years ago

I thought I'd take a stab at addressing this in Glibmm (with thought that we could have a temporary workaround in the generator, or maybe just ask for patch to be backported to stable branches). Created a wip branch here:

https://gitlab.gnome.org/martin-ejdestig/glibmm/tree/variant_std_string_get_vs_non_0_terminated

But...

When adding tests for this in glibmm/tests/glibmm_variant/ I started looking into how to create a Glib::Variant to test it. I noticed that g_variant_new_bytestring() and G_VARIANT_TYPE_BYTESTRING documents that a '\0' is always added to the end of the array. Perhaps mapping a value of type "ay" that may not be a "valid C string" to Glib::Variant<std::string>/G_VARIANT_TYPE_BYTESTRING is not correct?

Looking around for how this is solved in gdbus-codegen it seems like the recommended way is to use org.gtk.GDBus.C.ForceGVariant. From https://developer.gnome.org/gio/stable/gdbus-codegen.html :

This automatic mapping can be turned off by using the annotation org.gtk.GDBus.C.ForceGVariant - if used then a GVariant is always exchanged instead of the corresponding native C type. This annotation may be convenient to use when using bytestrings (type-string 'ay') for data that could have embedded NUL bytes.

Perhaps it makes sense for gdbus-codegen-glibmm to respect org.gtk.GDBus.C.ForceGVariant as well?

martin-ejdestig commented 5 years ago

Thought about this some more this morning and was wondering if Glib::Variant<std::vector<T>> could be used. Have not tested but it at least looks like there is no fast path for when T is a simple type like std::uint8_t at least. :(

https://gitlab.gnome.org/GNOME/glibmm/blob/281ebe7d403b4446eb7f733aa35348ace85b1a6e/glib/src/variant.hg#L1289

Can always be fixed I guess, but creating the variant with the C API like you are doing is probably the easiest solution. (Regardless if we do it directly in the generator or add support for e.g. org.gtk.GDBus.C.ForceGVariant so it has to be done outside of it.)

martin-ejdestig commented 5 years ago

Another idea... Supporting org.gtk.GDBus.C.ForceGVariant would be nice since it aligns with gdbus-codegen... but... perhaps a gdbus-codegen-glibmm specific annotation that forces "ay" to be mapped to std::vector<std::uint8_t> is nicer... from a "there is a more suitable type for this in the C++ language than there is in C" perspective. Do org.gtk.GDBus.C.ForceGVariant first since it is probably simpler and could be nice in other situations as well... and maybe think about alternatives later.... if wanted? I don't know.

It propably makes sense to keep mapping "ay" to std::string by default regardless. Or at least one would think so, since it is what the gdbus-codegen authors have chosen to do. Though... I think Qt always maps it to QByteString... but they have their UTF-16 shenanigans going on so not going to a "string" (char */std::string/QString) was more natural in their ecosystem.

(OK, that's enough info dumping/ramblings from me at the moment. At some point the signal to noise ratio becomes too low. :) sigh Why oh why did I not create a separate GH account for work so it would be easier to not look at (fun) things even remotely related to work during the weekends??? Sunday mode, engage!!! :)

erikboto commented 5 years ago

I'm a bit torn of what I think a good solution would be. I kind of like the idea of using the annotation, but the resulting code for the user would be more "unclean" since they have to operate on the Variant using the C-api to get the data. It also adds some extra complexity in the generator.

The annotation make more sense to me for the C-generator, since then the non-GVariant type is a char * which can't contain null is a nice way. Since std::string supports it, I don't think it's the best way forward.

Pros of using annotation:

Cons of using annotation:

Am I missing something in the pros/cons list?

And sure, using something other than std::string might be good/"cleaner". But then again, does it solve an actual problem or is it just a matter of taste? I think I'm for the KISS principle here :-)

martin-ejdestig commented 5 years ago

The annotation make more sense to me for the C-generator, since then the non-GVariant type is a char * which can't contain null is a nice way.

I am not even sure that this was a good idea in the C-generator. They could have e.g. always mapped it to a GByteArray and left it up to the user of the returned GByteArray to decide how to interpret data of the GByteArray.

But then again, does it solve an actual problem or is it just a matter of taste? I think I'm for the KISS principle here :-)

Yeah, perhaps you are right. Always expose "ay" as std::string and do not use Glib::Variant<std::string>/G_VARIANT_TYPE_BYTESTRING at all in the generator. KISS FTW.

But, hum... if what I wrote in the above paragraph is done... how do you discern between the case where you want a '\0' appended ("byte string") and when not ("raw data" where trailing 0 byte can change meaning drastically, think raw image data, or if a checksum is calculated from data)? If there is for instance a property of type "ay" now, the generator will always send a trailing '\0' over the bus when setting it due to G_VARIANT_TYPE_BYTESTRING?

GVariant *
g_variant_new_bytestring (const gchar *string)
{
  g_return_val_if_fail (string != NULL, NULL);

  return g_variant_new_from_trusted (G_VARIANT_TYPE_BYTESTRING,
                                     string, strlen (string) + 1);
}

+ 1 there makes it so, I think. (And it is the whole reason you are having the problem you are having. :)

If we always treat it as pure data (and therefore will never append '\0' even though std::string::data() will always have it there), is that OK? I guess it depends on how "ay" is used in the wild mostly. If they expect a trailing '\0' then gdbus-codgen-glibmm will be harder to use (have to add a extra trailing '\0' to the std::string manually... perhaps this is OK).

(Btw, I have no authority here in gdbus-codgen-glibmm land. I am just rubber ducking. :)

erikboto commented 5 years ago

The annotation make more sense to me for the C-generator, since then the non-GVariant type is a char * which can't contain null is a nice way.

I am not even sure that this was a good idea in the C-generator. They could have e.g. always mapped it to a GByteArray and left it up to the user of the returned GByteArray to decide how to interpret data of the GByteArray.

But then again, does it solve an actual problem or is it just a matter of taste? I think I'm for the KISS principle here :-)

Yeah, perhaps you are right. Always expose "ay" as std::string and do not use Glib::Variant<std::string>/G_VARIANT_TYPE_BYTESTRING at all in the generator. KISS FTW.

But, hum... if what I wrote in the above paragraph is done... how do you discern between the case where you want a '\0' appended ("byte string") and when not ("raw data" where trailing 0 byte can change meaning drastically, think raw image data, or if a checksum is calculated from data)? If there is for instance a property of type "ay" now, the generator will always send a trailing '\0' over the bus when setting it due to G_VARIANT_TYPE_BYTESTRING?

GVariant *
g_variant_new_bytestring (const gchar *string)
{
  g_return_val_if_fail (string != NULL, NULL);

  return g_variant_new_from_trusted (G_VARIANT_TYPE_BYTESTRING,
                                     string, strlen (string) + 1);
}

+ 1 there makes it so, I think. (And it is the whole reason you are having the problem you are having. :)

Another reason is that g_variant_get_bytestring() used by Glib::Variantstd::string::get() will return empty string if it's not null terminated. With the code in this PR the data is not sent null terminated, and it would work with any type of data.

The only issue is if a user wants to treat the string as something that is sure to be null terminated, since that's not the case anymore. If it was not null terminated before, the string was always empty. It was not the case that they would get the string with an additional null at the end.

If we always treat it as pure data (and therefore will never append '\0' even though std::string::data() will always have it there), is that OK? I guess it depends on how "ay" is used in the wild mostly. If they expect a trailing '\0' then gdbus-codgen-glibmm will be harder to use (have to add a extra trailing '\0' to the std::string manually... perhaps this is OK).

I don't think code in the wild should expect "ay" to be null terminated, since that implies that it also cannot contain null in the data. It won't work with e.g. org.bluez.GattCharacteristic1, which is the interface I used when I found this issue. In my opinion it would make more sense for those to apply their own check.

(Btw, I have no authority here in gdbus-codgen-glibmm land. I am just rubber ducking. :)

I appreciate the rubber ducking! Let's see if we can wake up @jonte @mardy @kursatkobya to get some additional opinions here.

martin-ejdestig commented 5 years ago

FYI took a quick look at one of the uses of "ay" in /usr/share/dbus-1/interfaces on my system.

https://github.com/flatpak/xdg-desktop-portal/blob/164676588afe439d068d1fdc4c370a672e92f107/data/org.freedesktop.portal.Documents.xml#L240

https://github.com/flatpak/xdg-desktop-portal/blob/1f9bcfbb5587b1b62d943c83732aebc5235bafc2/document-portal/document-portal.c#L1123

If I understand that code correctly they require a terminating '\0' sent over the bus for the filename argument. :(

On the other hand... perhaps broken services should not dictate decisions taken here. And filename.append('\0') would make a client that uses gdbus-codegen glibmm work in this case with the approach you suggest in your wip patch.

If I had to choose mapping without any "glib baggage", I would definitely go with std::vector<std::uint8_t> and not involve std::string for ay... ever. sigh

mardy commented 5 years ago

Using std::string to store arbitrary data feels wrong, even if it can support it. I would vote for mapping ay to std::vector<guchar> (given that we map y to guchar).

mardy commented 4 years ago

After some months filled with intense thoughts on this issue ( ;-) ), I think the best way to fix the immediate issue is to merge this: it has the advantage of not changing the API of the generated code.

Adding an annotation to force the usage of a different type is a good idea, too, but for the immediate issue at hand, I would propose merging this.

erikboto commented 4 years ago

Nice, I'd love to not depend on my special branch anymore! They have a tendency to rot after a while.

So I rebased on top of current master, and updated the tests.

kursatkobya commented 4 years ago

I would say, "Do it!"