dbus2 / zbus-old

Rust D-Bus crate.
https://gitlab.freedesktop.org/dbus/zbus
Other
49 stars 13 forks source link

zvariant: Possible (breaking) changes to make serialization simpler #256

Open zeenix opened 2 years ago

zeenix commented 2 years ago

In GitLab by @ids1024 on Jan 18, 2022, 23:24

It is unintuitive to need SerializeDict/DeserializeDict rather than Serialize/Deserialize. It seems (from the Matrix chat) it's not just me who would have expected structs to serialize as a dict by default (like serde_json). It also means not being able to use serde macros like #[serde(flatten)], without SerializeDict/DeserializeDict getting logic to parse and handle these.

Since the DBus spec defines a "struct" as an sequence of types without names assigned to the fields, it seems more like a tuple. But I guess DBus was designed by C programmers, where there isn't such a distinction.

It is also confusing to need TryFrom<OwnedValue> in some places, and Deserialize in others. In simple cases, the OwnedValue trait makes it easy to implement, at least, but that likewise doesn't have all the options that #[serde] attributes can provide.

I think a few breaking changes could improve this:

I think this would generally make the derive macros in zvariant unnecessary, other than Type (which is also easy to manually implement when custom behavior is needed). The API should be easier to understand, and standard #[serde] attributes can be used everywhere to control serialization behavior.

zeenix commented 2 years ago

Implement a Deserializer and Serializer for OwnedValue/Value similar to serde_json (which uses something like that to implement from_value/to_value).

How would that work for zbus? It just has a generic dbus serializer/deserializer and can't use custom one for just one type, as it could be embedded. With serde_json, people often need to just transform a whole JSON into/from a value so that works for that use case but that's not our case.

Serialize serde structs to a{sv}. Tuples or tuple structs are then needed to serialize/deserialize a dbus "struct", or serde_tuple could be used.

As I wrote on Matrix, we likely don't need something so drastically different than what we have today. Since the serializer and deserializer has access to the signature, they could be change to act accordingly if they receive a{sv} signature and told to serialize/deserize a struct. Then we can deprecate the SerializeDict and DeserializeDict.

zeenix commented 2 years ago

Since the DBus spec defines a "struct" as an sequence of types without names assigned to the fields, it seems more like a tuple. But I guess DBus was designed by C programmers, where there isn't such a distinction.

They were also python devs. :smile: That isn't the reason though AFAIK. Field names would take space and D-Bus was designed to be efficient. That's probably also why this isn't something GVariant changed. As a programmer, I'd prefer field names in the struct so I can easily tell what is what, even if the binary encoding doesn't doesn't keep them.

zeenix commented 2 years ago

In GitLab by @ids1024 on Jan 19, 2022, 01:07

I think some binary serialization formats require the use of a schema specification that may statically define names for fields. But I guess DBus introspection XML is optional, and perhaps was added later.

I'd prefer field names in the struct so I can easily tell what is what, even if the binary encoding doesn't doesn't keep them.

In a lot of cases I'd agree. I was hoping serde would have a built-in attribute for serializing a named struct as a tuple, since this isn't really a zvariant/dbus specific issue, but it seems to currently require a crate like serde_tuple.

zeenix commented 2 years ago

In GitLab by @ids1024 on Jan 19, 2022, 01:16

serde-json has a impl<'de> serde::Deserializer<'de> for Value implementation, and a serde_json::value::Serializer struct, in addition to serde_json::Deserializer and serde_json::Serializer.

Having to implement Deserializer and Serializer twice is an annoying amount of boilerplate, and may take some care to make sure the behavior matches. But I guess a lot of the complexity already has to be handled by the OwnedValue and Value proc macros, and this would move it from codegen to generic code.

(I don't know how this would fit with the separation between OwnedValue and Value.)

zeenix commented 2 years ago

I think some binary serialization formats require the use of a schema specification that may statically define names for fields. But I guess DBus introspection XML is optional, and perhaps was added later.

Those schemas are not encoded in the binary itself but rather exists for developer mostly. At least that was the idea with dbus interface XML. Even introspection API is used more by developers to generate code against at build time rather than something needed in every message itself.

I'd prefer field names in the struct so I can easily tell what is what, even if the binary encoding doesn't doesn't keep them.

In a lot of cases I'd agree. I was hoping serde would have a built-in attribute for serializing a named struct as a tuple, since this isn't really a zvariant/dbus specific issue, but it seems to currently require a crate like serde_tuple.

Well, as we already discussed, in many other formats (e.g JSON and TOML), struct nicely translates to an entity that has field names encoded in them. I do think in general it's good that serde is based on the rust types more rather than some various formats that were implemented from the beginning (with some exceptions). It allows each format to decide what each rust type should be mapped to.

zeenix commented 2 years ago

serde-json has a impl<'de> serde::Deserializer<'de> for Value implementation, and a serde_json::value::Serializer struct, in addition to serde_json::Deserializer and serde_json::Serializer.

Right and hence my question:

How would that work for zbus? It just has a generic dbus serializer/deserializer and can't use custom one for just one type, as it could be embedded. With serde_json, people often need to just transform a whole JSON into/from a value so that works for that use case but that's not our case.

zeenix commented 2 years ago

Since the serializer and deserializer has access to the signature, they could be change to act accordingly if they receive a{sv} signature and told to serialize/deserize a struct.

Thought more about this and went through the code and realized it won't be so easy. The problem is the v bit. The serializer needs the signature of the actual type and if it just gets v, it then expects it from the Value's Serialize impl.

I also got reminded that this problem has been discussed before and @jhartzell42 even attempted to solve it but their solution was too intrusive and complicated for our liking in the end (see !342).

zeenix commented 2 years ago

Another relevant thing I got reminded was that my first serde::Serializer implementation attempt did not involve taking signature as input. The serializer would instead create the signature as it serialized the value. IIRC, things broke down when I had to handle the variant type (i-e Value). It's possible that I'm misremembering a bit (it was 2 years ago) and it was only the Deserializer impl that was hard to achieve without signature being given.

In any case, I'm not saying that my suggestion isn't possible to implement, just that it won't be as trivial as I originally thought.

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 19:10

This awkward distinction between Serialize and SerializeDict is a problem in the codebase at my job, as anything marked SerializeDict then can't be used cleanly with other serde libraries.

So I continued working on solving this on a fork of zvariant, and I concluded that there was no way around it, the notion that the Type had to be derived separately was making this impossible. Furthermore, I thought it should be possible to make a serde implementation that didn't need a Type trait. So I started over, writing a completely new serde format for DBus, with the intention of integrating it into zbus later. It's not (yet) at feature parity with zvariant, but I think it's getting close to being usable with a little more work, and it does not use Type at all. We were going to wait until trying to integrate it with zbus before publicizing it, but because this conversation's come up again, here it is:

https://gitlab.com/racepointenergy/rust_libraries/serde_dbus/

Unless another solution to this problem can be found, I am going to continue working on smoothing out the rough edges with this, posting it to adding additional features, and integrating it as an external dependency with either zbus (whether as a new default serde impl or as a feature flag along zvariant) or, if that ends up not being possible, a fork of zbus.

Is anyone else interested in this direction? @ids1024 you mentioned other people in Matrix?

Code review on this new serde DBus implementation would be appreciated :-)

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 19:22

In a lot of cases I'd agree. I was hoping serde would have a built-in attribute for serializing a named struct as a tuple, since this isn't really a zvariant/dbus specific issue, but it seems to currently require a crate like serde_tuple.

My serde_dbus has a SerializationPolicy that handles this and can make the decision on a struct-name by struct-name basis.

zeenix commented 2 years ago

Hi @jhartzell42 :smile:

I'm very glad that you didn't just give up after our disagreements. :smile:

So I continued working on solving this on a fork of zvariant, and I concluded that there was no way around it, the notion that the Type had to be derived separately was making this impossible. Furthermore, I thought it should be possible to make a serde implementation that didn't need a Type trait. So I started over, writing a completely new serde format for DBus, with the intention of integrating it into zbus later.

Hmm.. interesting. I completely get the Type issue but why not attempt to remove the Type constraint from the zvariant impl instead of starting over? :thinking: As I wrote in the last comment, that was actually the plan when I first started implementing the serializer/deserializer but the variant ser/de didn't make it easy.

Moreover, now that I think about it, I recall that empty array (Vec and slices for Rust) was impossible w/o the signature being available to the serializer, since you've to align the first element on a dbus array even if there is no first element, and unfortunately the alignment depends on the type. How did you solve that issue? :thinking:

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 19:58

Yep, that's one of the rough edges for sure. So the current code definitely does the wrong thing (although the array implementation code is far more complete than what's publicly exposed, so this is actually pretty easy to fix). Currently, it serializes all arrays as av, no matter what. Also, it appears that I have a glitch where if the array is empty it falls back on a(), which I have more recently learned is invalid, and I have a TODO to fix that (I'm going to be moving my TODOs into the repo along with more docs presently).

So, a few thoughts. What I have instead of Type is something similar and yet (I hope) more flexible, which is a SerializationPolicy that has its own trait and gets passed down along with the serializer. The deserializer does not need this. The SerializationPolicy is queried with the struct name every time a struct is encountered, and it indicates whether the struct should be serialized as a{sv} or (e.g.) (isd...). This could be used to set types for the arrays, which gets us right back to having Type but I hope in a more flexible way. My goal would be for SerializationPolicy to be optional, for (possibly separate sets) of defaults to work for most use cases. I think the default for the array would be to serialize as av if the array happens to be empty, and to verify that the signatures match if not (the code to do this is there, just needs a little tweaking). This would mean in practice there might be problems with empty arrays, and for messages that need different behavior, they can set a SerializationPolicy...

I know this sounds super complicated but I think it's going to be better in practice; take a look at the code :-) https://gitlab.com/racepointenergy/rust_libraries/serde_dbus/-/blob/trunk/src/ser/serializer_policy.rs

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 20:08

I plan on adding documentation, TODO lists, etc. to this repo over the next two days, in case you'd like to wait for more of that before reviewing in detail.

zeenix commented 2 years ago

Thanks for explaining. As you said yourself, requiring SerializerPolicy isn't very different from requiring Type so if you require it, IMO (you can disagree of course) it's not really much of an improvement. As for serializing arrays as always av, that won't work for a lot of cases out there, unless you are implementing a service, where you can decide the interface and types being used. When developing client code for existing services, this would be an absolute show stopper so if you want this work to be generally useful, you'll need to figure out a solution (assuming you want SereializerPolicy to be optional). Note that I'm assuming service implementors will be OK with always encoding their array elements as av, which won't always be the case as variant encoding adds extra bytes per element so if there is a Vec<u16> with a big length, variant encoding of each element will be extremely inefficient (going against the efficiency promise of D-Bus and Rust both).

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 20:53

How often are empty arrays used? I would guess that they are not strictly needed that often.

I plan on changing it (and the change is actually pretty easy, most of the code is there) so that non-empty arrays just verify that the signature is always the same for every element. If you don't send any empty array, SerializerPolicy need not be used. If you do send an empty array, it will send the wrong signature in that case (probably av, but whatever makes the most sense), unless you specify SerializerPolicy to override that. So SerializerPolicy would be optional unless you need empty arrays to have a specific signature, which I think is only a minority of the time?

I also think SerializerPolicy is a drastic improvement over Type because it supports Serialize and SerializeDict equally well without need for separate Serialize instances (solving the problem in this issue completely) and generally avoids coupling the Serialize instance to this particular format. Again, this is the core issue at my employer, so for me it's still worth it.

But I would much rather have something that you also think is a great improvement. I would ideally like this to be integrated and supported into mainstream zbus (alongside zvariant via feature flag), so if you can think of any way to improve this particular case with the arrays, please let me know! I will in any case be proceeding but it would be nice to be able to work together!

@ids1024 , what do you think of the trade-offs involved? Also, where is the Matrix instance where I'm apparently missing out on some good discussion?

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 21:14

Another advantage is that you could write one SerializerPolicy for the entire application, and combine your application's SerializerPolicys with an existing one that would cover all the applications in zbus.

zeenix commented 2 years ago

How often are empty arrays used? I would guess that they are not strictly needed that often.

If a set is empty, it's empty. It's not a matter of how abundant it is, it's essential to support that. Even one of the very first test case I wrote for zbus got an empty array in return of a method call on some Linux distros. Try the properties and methods of bus' own service (org.freedesktop.DBus) and you'll soon encounter an empty array.

Believe me, if it was possible to ignore empty arrays, I would have. :smile:

zeenix commented 2 years ago

If you do send an empty array, it will send the wrong signature in that case (probably av, but whatever makes the most sense), unless you specify SerializerPolicy to override that.

Given that specifying the wrong signature is not an option in generic public API and (as I pointed out) you don't have a way to correctly handle the alignment, that essentially means SerializerPolicy has to be required.

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 21:45

Very good point! And I guess I'll start working on my version of a derive macro to make this easier. Sigh.

Given that, I suppose it would not be a great improvement from your point of view, although still worth it from mine. Would you be willing to accept MRs to allow this serde implementation to be used as an alternative, guarded by a feature flag? The alternative will be that I will have to fork zbus to create a DBus library that meets my employers' needs, as I see no other way forward to resolve this issue.

I am open to other suggestions for how to address this.

zeenix commented 2 years ago

Given that, I suppose it would not be a great improvement from your point of view, although still worth it from mine. Would you be willing to accept MRs to allow this serde implementation to be used as an alternative, guarded by a feature flag?

If you could solve the array encoding (especially w/o requiring SerializerPolicy), I can promise to give it serious consideration and try to be as impartial I could during that consideration. I hope that's fair enough. :smile:

The alternative will be that I will have to fork zbus to create a DBus library that meets my employers' needs, as I see no other way forward to resolve this issue.

You're always free to do that, of course. It'll be a pity but not at all the end of the world. :smile: However, if you couldn't figure out the array problem and publish it (e.g on crates.io), it'd great if you document that well. People might try your crate and it'd work flawlessly until they encounter an empty array. They'd be very dissapointed. I also suspect, sooner or later you will encounter empty arrays at your company too, unless of course both service and client are always implemented by your company.

zeenix commented 2 years ago

I am open to other suggestions for how to address this.

I don't have any other ideas but I'm almost 100% sure that any proper solution would involve you having to give up on your desire to making SerializerPolicy optional.

btw, zvariant has an EncodingContext type that both serializer and deserializer use. At the moment, it only dictates the byte order and encoding format, but other options can certainly be added to it, if needed. So we decide to add some policy data/config, we can do it w/o breaking any API.

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 19, 2022, 22:39

The array problem is insolvable without some sort of out of band communication; see the WONTFIX on this issue: https://github.com/serde-rs/serde/issues/607

Given that, the options are encoding the information in the data we do have (e.g. specifying special, bespoke types in the Serialize instance, similar to what SerializeDict currently does) or conveying the information to the Serializer through some other value (either signatures like those provided by the Type trait, or information on specific types as provided by a SerializerPolicy). Which of these solutions would you prefer?

Do not worry! I am not going to go forward with my original plan to simply generate a different signature on an empty array. You have thoroughly talked me out of it.

I will instead have the default be to create an av, and to require a SerializerPolicy if you want to instead specify another signature, in which case it will generate an error if any signatures of the elements do not match. This solves the array problem, in that it doesn't allow for situations where an empty array will have a surprisingly different signature, but does require a SerializerPolicy in any case you do not want av to be the signature.

To then make that burden easier: There would then be a provided SerializerPolicy that would be specified by default that would include metadata for all the structures supported in zbus, and an API for expanding that for you.

How does that sound to you?

zeenix commented 2 years ago

Given that, the options are encoding the information in the data we do have (e.g. specifying special, bespoke types in the Serialize instance, similar to what SerializeDict currently does) or conveying the information to the Serializer through some other value (either signatures like those provided by the Type trait, or information on specific types as provided by a SerializerPolicy). Which of these solutions would you prefer?

Not sure I understand 100% but zvariant already provides API to be able to given the signature of the data, if you want to ser/de a type that doesn't implement Type trait. zbus currently only relies on Type impl because that's the simplest solution for most cases. zbus could offer other API to be given the signature.

Do not worry! I am not going to go forward with my original plan to simply generate a different signature on an empty array. You have thoroughly talked me out of it.

Good. :smile:

I will instead have the default be to create an av, and to require a SerializerPolicy if you want to instead specify another signature, in which case it will generate an error if any signatures of the elements do not match. This solves the array problem, in that it doesn't allow for situations where an empty array will have a surprisingly different signature, but does require a SerializerPolicy in any case you do not want av to be the signature.

As I already wrote, that's not really making SerializePolicy optional since:

  1. It's not possible to choose the av as the format (especially for client-side only code).
  2. In many cases, av would be very inefficient and hence not acceptable.

You'd only make SerializerPolicy optional for only specific use cases. i-e your API would not be generally useful.

To then make that burden easier: There would then be a provided SerializerPolicy that would be specified by default that would include metadata for all the structures supported in zbus, and an API for expanding that for you.

Maybe I need to see the rust API to understand how that'd be different/better than what we already have and can provide (the API that takes signature from the user, instead of requiring Type). Also, I don't know if you noticed but with latest zvariant release, you can very easily implement Type like this:

#[derive(Type)]
// Hardcode the signature.
#[zvariant(signature = "...")]
struct Whatever {
   // Now the type of fields don't matter to `Type`.
...
}

and since it's trivial to create a newtype struct for any type, the need for API to explicitly provide the signature (through any means) become even less useful.

zeenix commented 2 years ago

Implement a Deserializer and Serializer for OwnedValue/Value similar to serde_json (which uses something like that to implement from_value/to_value).

How would that work for zbus? It just has a generic dbus serializer/deserializer and can't use custom one for just one type, as it could be embedded. With serde_json, people often need to just transform a whole JSON into/from a value so that works for that use case but that's not our case.

@ids1024 Last night I gave this a bit more thought and understood what you're really suggesting. At first I thought why wouldn't it work but then I remembered a fact that you probably don't know (yet), that I learnt the very hard way: You can't encode a variant separately from the rest of a D-Bus message payload because variant's own payload needs to be aligned and that needs to be with respect to the position in the entire payload it'll be the part of, not the variant container itself.

My first representation of Value was actually to keep it always in the encoded format but that failed because of this alignment problem. If a variant is holding a basic type, you can always adjust the alignment when serializiing but for a container type, that'd mean adjusting alignments for all the elements, which would be very inefficient.

I hope both you and @jhartzell42 has seen my presentations (VariantType was the former name of Type) where I go through some of these problems I faced trying to create the best API for zvariant. I also contributed to the dbus spec to make both the value payload and array alignment requirement clearer so the future implementers don't have to figure it out the hard way.

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 20, 2022, 17:56

Maybe I need to see the rust API to understand how that'd be different/better than what we already have and can provide (the API that takes signature from the user, instead of requiring Type).

It is very different in many small ways, but the only way in which it's really "better" is that it does not require special Serialize implementations for SerializeDict-type behavior, allowing you to use the same Serialize implementations for DBus and JSON.

I know this isn't very interesting to you, but I do think it is interesting to others. I'm going to continue developing serde_dbus, and use these methods in our own internal libraries to connect it to zbus for right now, as a stop-gap pending a better future more comprehensive solution: https://gitlab.freedesktop.org/dbus/zbus/-/blob/main/zbus/src/message.rs#L411 https://gitlab.freedesktop.org/dbus/zbus/-/blob/main/zbus/src/message.rs#L596

zeenix commented 2 years ago

In GitLab by @jhartzell42 on Jan 20, 2022, 18:04

You can't encode a variant separately from the rest of a D-Bus message payload because variant's own payload needs to be aligned and that needs to be with respect to the position in the entire payload it'll be the part of, not the variant container itself.

My serializer handles this by having an intermediate representation where bytes are stored in chunks alongside the alignment of each chunk. This is currently very inefficiently stored, but I have a plan for making it more efficient.

As I said, I'm going to continue working on serde_dbus; I have a (non-ideal, but better than forking) path forward to using it internally at my job, and on the long-term schedule I have some inklings of ideas on how to get it in a state you'll be more interested in integrating into zbus.

I hope both you and @jhartzell42 has seen my presentations (VariantType was the former name of Type) where I go through some of these problems I faced trying to create the best API for zvariant. I also contributed to the dbus spec to make both the value payload and array alignment requirement clearer so the future implementers don't have to figure it out the hard way.

As one such "future implementer" I appreciate that! I had not seen the presentations, I will watch soon. Thank you!

zeenix commented 2 years ago

Maybe I need to see the rust API to understand how that'd be different/better than what we already have and can provide (the API that takes signature from the user, instead of requiring Type).

It is very different in many small ways, but the only way in which it's really "better" is that it does not require special Serialize implementations for SerializeDict-type behavior, allowing you to use the same Serialize implementations for DBus and JSON.

That's very nice but SerializeDict is not always what you need (actually most of your types used with D-Bus wont be needing it) and in the end, it's just a helper. You can always use a HashMap if you need more control over the serialization.

I know this isn't very interesting to you, but I do think it is interesting to others.

It is interesting to me as well but I've to think about all uses in general way and not only about serializing dicts. It'd awesome if we can drop *Dict macros but I don't see it as a huge issue if we dont/cant, since there are ways to achieving the same (e.g HashMap I mentioned above). Things don't need to be super perfect, especially when dealing with a rather old binary format like D-Bus.

I'm going to continue developing serde_dbus

I wish you the best of luck!