Closed jonas32 closed 4 years ago
Will do a full review later, but just an initial observation on the breaking interface change for all list/map operations to add a context parameter. E.g. as can be seen in the tests, an empty context parameter needs to be added to all existing operations:
let ops = &vec![lists::size("bin", &[])];
What do you think about adding methods to set/add a context to the CdtOperation
type itself, e.g.
let op = lists::size("bin");
let ctx = ctx_list_index(3);
op.set_context(&[ctx]);
Same can be done for operations which require an optional policy as well, e.g.
let op = lists::insert_items("bin", 1, &values);
let policy = ListPolicy::default();
op.set_policy(policy);
These new methods should return the CdtOperation
instance itself, so they can be chained:
let ctx = ctx_list_index(3);
let op = lists::insert_items("bin", 1, &values)
.set_policy(ListPolicy::default())
.set_context(&[ctx]);
We could even do the same for the map return type as well, though removing that argument from the map op functions would off course in itself be a breaking change.
In general, I'm not opposed to introducing backward-incompatible changes at this early stage of the client. (I.e. until we declare the client "good enough" for a 1.0 release.) But I think introducing this kind of builder pattern makes the interface future-proof as well, such that adding additional parameters will not break the API every time.
I have been following this pattern for the Node.js client. E.g. see the ListOperation class for an example.
Good idea. I also didnt like the idea of passing the empty slice every time.
I will try to build it like that and report back if it works.
All CDT functions return the Operation
struct. So for set_context, it is easy to do. Context is set in the Operation.
Policy is set in the CdtOperation
struct. In addition, policy is built into the args array together with other parameters that can not be identified once inside. It also looks like policy has a required fixed position in this array. Changing policy with the chain function would cause rebuilding this whole array.
To be consistent with the normal Read/Write functions, i would probably build policy into the CDT functions, like its done for the normal basic read/write functions with WritePolicy and ReadPolicy.
Im currently trying to implement the HLL Operations.
While doing that, i found out that HLLs are handled different than other returns. The Golang Client for example uses a HLLValue datatype. From my understanding, its just a Vec
Im done implementing everything besides HLL and the Wildcard/Inf stuff. Ill do HLL in a second PR. I found some caching technique in the golang client for the encoded ops, but that will also be part of a second PR. Some parts are breaking changes, but i think they should be or cant be done different. For example Policies on Lists. Using a set_policy method would not be consistent to Map ops or the normal Operations. From my side, this is ready for review/merge.
Policy is set in the
CdtOperation
struct. In addition, policy is built into the args array together with other parameters that can not be identified once inside. It also looks like policy has a required fixed position in this array. Changing policy with the chain function would cause rebuilding this whole array. To be consistent with the normal Read/Write functions, i would probably build policy into the CDT functions, like its done for the normal basic read/write functions with WritePolicy and ReadPolicy.
I see your point. Would take quite a bit of refactoring to decouple the Operation
/CdtOperation
structs from the final on-wire protocol layout; might hurt performance as well. I'm fine with your current implementation.
While doing that, i found out that HLLs are handled different than other returns. The Golang Client for example uses a HLLValue datatype. From my understanding, its just a Vec return and acts the same way as Value::Blob. I personally never used HLL in any way, so i dont understand how to implement it. Can you explain, why it needs to be handled different and whats the reason behind HLLValue?
Will have to look into that, as I'm not familiar with the Go client implementation.
What I've seen so far looks great! But I need some more time for a full review, as this is a quite large PR (3000+ lines added). Please bear with me.
No problem, i already expected that to be a longer review process. For the HLL implementation: You dont have to check it in the golang one. I just used this one as reference, because its the easiest for me to port to rust. Java or C client as reference is also fine. The easiest would be nodejs for me, but this client seems to use a lot of FFI bindings to the C lib.
The easiest would be nodejs for me, but this client seems to use a lot of FFI bindings to the C lib.
Yes, the Node.js client – which I also maintain – is just a wrapper around the C client. And the C client does not have a separate type for HLL values. I've reached out to @BrianNichols, who maintains both the Java and C clients, to hopefully shed some light why some clients have a separate type and others don't.
Here's what Brian had to say on the topic of the HLLValue type:
The java client does not include the particle type when returning values to the user. If HLLValue did not exist, a byte[] would have been returned instead. The problem is when the user tries to use the byte[] as input to another command. Value.get(Object value) would not know that the byte[] is really an HLL particle type. HLLValue's real purpose is to identify the bytes as HLL bytes. GeoJSONValue exists for the same reason to distinguish GeoJSON strings from regular strings.
The C client returns as_bytes which includes the particle type, so HLLValue is not needed. If I were able to redesign the java client today, I would return the particle type with each value (or just return Value objects instead of raw objects).
For the Rust client, that leaves us with two options: We could either extend the existing Value::Blob
type to include the particle type, e.g. ParticleType::BLOB
, ParticleType::HLL
, etc. Or we could add a separate Value::HLL
type. I think I would opt for the latter, as it's more explicit.
Context is set in the Operation.
Actually, why is the CDT context part of the Operation
struct and not part of the CdtOperation
struct? It's the CdtOperation::estimate_size
and CdtOperation::write_to
functions that actually handle the context.
Policy is set in the CdtOperation struct. In addition, policy is built into the args array together with other parameters that can not be identified once inside. It also looks like policy has a required fixed position in this array. Changing policy with the chain function would cause rebuilding this whole array.
The "problem" here is that the client's data structures too closely model after the wire protocol. For a more ergonomic client API it would be better if the Operation
/CdtOperation
structs were modelled at a slightly higher abstraction level, and then converted to wire format at the last moment, when serializing to the buffer. But I'm not sure what the performance impact would be, if any. Also, that would off course be a pretty major rewrite of the entire operations code at this point. :-) Not sure that's worth it. Esp. since close alignment between the different Aerospike client APIs also has some value.
Actually, why is the CDT context part of the Operation struct and not part of the CdtOperation struct? It's the CdtOperation::estimate_size and CdtOperation::write_to functions that actually handle the context.
It has two reasons. First, i was not sure if it only works with CDT Operations or also with for example the scalar operations on lists. Second, if i would put it in the CdtOperation itself, OperationData and the CdtOperation itself would always have to be mutable. The set_context function would still have to stay in the mod.rs, because its required for chaining on operation functions. Otherwise you would have to do it on the OperationData by matching for the right CdtType. Thats why i set it on the Operation and pass it to the CdtOperation when building the buffers.
The "problem" here is that the client's data structures too closely model after the wire protocol. For a more ergonomic client API it would be better if the Operation/CdtOperation structs were modelled at a slightly higher abstraction level, and then converted to wire format at the last moment, when serializing to the buffer. But I'm not sure what the performance impact would be, if any. Also, that would off course be a pretty major rewrite of the entire operations code at this point. :-) Not sure that's worth it. Esp. since close alignment between the different Aerospike client APIs also has some value.
Thats i guess what this caching mechanism does in the Golang client. It builds the buffer at first execution and then saves it to the Operation for later use. I think rewriting the operations to match the other libs would be nice, but definitely a full breaking change. When we put that much effort into the operations to improve them, we should also improve the parts that could break stuff. (Policies, Contexts, Names of functions etc.)
If a full breaking change is fine, i can give it a try. Im not sure how time consuming that will be, so it might take a while. When refactoring that much code, we should probably also talk about what else needs to be done for a 1.0 release.
For the Rust client, that leaves us with two options: We could either extend the existing Value::Blob type to include the particle type, e.g. ParticleType::BLOB, ParticleType::HLL, etc. Or we could add a separate Value::HLL type. I think I would opt for the latter, as it's more explicit.
I think it would probably be easier and more straightforward to do it with Value::HLL
, but i never used it so im ok with both options.
I think it would be better to add support for predicate expressions in a separate PR and to just keep this PR focused on map/list/bitmap operations. As such, I have not yet reviewed the changes you did to e.g. the scan policy, and other changes purely needed to support predicate expressions.
i added it here, because i found them missing while changing the encoder. I didnt want to open a whole new PR for this few lines but I can remove them here and move them to an own one.
PredExp is removed from Policy and Scan again. I will not make a new PR for Scan and Policies again. Would have to be removed with the next version anyways. All quersions/changes should be done now.
Do you mind pulling from the aerospike-client-rust:update-operations
branch? I've fixed a few minor lint warnings and added/updated the copyright statements in all modified files. But I don't have permission to push to your is-it-fresh:update-operations
branch directly.
Once that's done, this is ready to merge, right?
Yep, looks like this is done.
Thank you @jonas32
Big update to the operations system. (#78 ) Including:
CAUTION: This is a breaking change. The policy and return types for Lists require additional parameters for the cdt op functions.