aerospike / aerospike-client-rust

Rust client for the Aerospike database
https://www.aerospike.com/
Other
82 stars 29 forks source link

Replacement of Value and Bins with Writable/Readable derive macros #138

Open jonas32 opened 1 year ago

jonas32 commented 1 year ago

New PR instead of #126 with the actual changes. I think its time to close the other one.

Writable:

Readable:

Client Update:

Exp/Op:

CDT:

jonas32 commented 1 year ago

Currently reads return a Result<Record<T>>. The T only replaces what used to be bins: HashMap<String, Value> with bins: T. Is that the wanted behavior or should Record also be replaced so that the T also includes record meta?

CMoore-Darwinium commented 1 year ago

Hi @jonas32, may I confirm that the purpose of this patch is to directly deserialize from the wire format (and serialize to the wire format) rather than using the Value enum?

If so, it looks like exactly what I need.

jonas32 commented 1 year ago

Hey @CMoore-Darwinium, yes. Goal is to directly build structs from the wire and write to it without intermediate steps and as low redundant allocations as possible. In the end, the target would be to more or less deprecate Bins and Value.

CMoore-Darwinium commented 1 year ago

Hey @CMoore-Darwinium, yes. Goal is to directly build structs from the wire and write to it without intermediate steps and as low redundant allocations as possible. In the end, the target would be to more or less deprecate Bins and Value.

That's exactly what I've been hoping for. Profiling shows that creating and destroying aerospike::Value is our current limiting factor.

Just playing devil's advocate here and I see you've already invested a lot of time into this patch.

You've got these traits, ReadableValue / WritableValue that interact directly with PreParsedValue, which is essentially aerospike's wire format.

I was wondering if we could use serde::Serialize and serde::Deserialize instead. That way implementations of serde::Deserializer and serde::Serializer, MapAccess, SeqAccess, etc could be provided for structpack, CDT, bins, formats etc could be passed as required.

It would have the advantages:

  1. Dealing with weird cases like, "it might be a string in old records and a sequence in new records" can be done nicely using a visitor instead of wire format and looking at particle.
  2. Serde's derive macros already have all of the magic sugar to handle defaults, multiple ways of handling enums, rename_all, etc.
  3. Wire format is hidden within this crate rather than being exposed in the public interface.

If you're open to the idea, I would be able to help with the coding.

jonas32 commented 1 year ago

i thought about this too initially. But there are a lot of tricky details with the wire format that might become hard to handle without working in the scope of the whole buffer. Lists/Maps are a good example, where you basically get X amount of bins with the same name and X amount of entries inside each and then have to join them together after parsing both. And on top, a different encoding for CDT returns.

Not minding about particle type would also be hard, because that often has implications on the way values are encoded. According to @khaf's answer to my question in the other PR, automatic casting of values to other types is not wanted behavior.

Also having serde that deep inside of the client would build a big dependency on that crate, since there would be no way to interact with the Server outside of serde.

Is it easily possible to add additional magic sugar for client specific things? Like ordered maps, geojson, hll etc.?

In general I'm still open for that idea if it does not impact on performance/allocations. But i would like to hear @khaf's opinion on that first before i start digging into that. Most of the work i spent here was figuring out how the wire format works and directly implementing it for each type. So it would not be a lot of wasted time to switch that now.

CMoore-Darwinium commented 1 year ago

With Lists / Map serde works quite well. It provides these two traits that would expose the underlying buffer to the type being deserialized as an iterator. serde::de::SeqAccess serde::de::MapAccess It gives a lot of flexibility to the deserializer to provide various typed key-values. So we would be implementing it on either the buffer type itself, or a wrapper type around it.

The particle type would be used. The idea is, the particle would be used in the deserializer which would call the right function on a Visitor specific to the type being destreamed https://docs.rs/serde/latest/serde/de/trait.Visitor.html the type itself would be responsible for either implementing or not implementing the appropriate visit function depending on whether the streamed data type makes sense to what you're streaming into.

Serde itself has no dependencies. Its mostly just a set of traits that gets compiled down to essentially nothing so I don't think there is a huge problem in introducing it.

Magic sugar for geojson and hll is more of a concern. I'll need to look into that. Some serde deserializers have their own special target types but it might be hacky. It could certainly be made to work, but finding a clean way might be harder.

Serde does not allocate apart for a few very specific situations. Flattening structures is the only one I can think of off hand. It is mostly just a few layers of generics without much actual logic actually happening.

jonas32 commented 1 year ago

Sounds good. Ill try to read a bit deeper into how serde could be implemented over the next days.