Update operations - Githubissues

jonas32 commented 4 years ago

Big update to the operations system. (#78 ) Including:

Operation Contexts for nested operations (done)
Missing List Operations (done)
List Policys (done)
List Order (done)
Missing Map Operations (done)
Bitwise Operations (done)
HLL Operations
Predicate Expressions (done)

CAUTION: This is a breaking change. The policy and return types for Lists require additional parameters for the cdt op functions.

jhecking commented 4 years ago

Will do a full review later, but just an initial observation on the breaking interface change for all list/map operations to add a context parameter. E.g. as can be seen in the tests, an empty context parameter needs to be added to all existing operations:

let ops = &vec![lists::size("bin", &[])];

What do you think about adding methods to set/add a context to the CdtOperation type itself, e.g.

let op = lists::size("bin");
let ctx = ctx_list_index(3);
op.set_context(&[ctx]);

Same can be done for operations which require an optional policy as well, e.g.

let op = lists::insert_items("bin", 1, &values);
let policy = ListPolicy::default();
op.set_policy(policy);

These new methods should return the CdtOperation instance itself, so they can be chained:

let ctx = ctx_list_index(3);
let op = lists::insert_items("bin", 1, &values)
  .set_policy(ListPolicy::default())
  .set_context(&[ctx]);

We could even do the same for the map return type as well, though removing that argument from the map op functions would off course in itself be a breaking change.

In general, I'm not opposed to introducing backward-incompatible changes at this early stage of the client. (I.e. until we declare the client "good enough" for a 1.0 release.) But I think introducing this kind of builder pattern makes the interface future-proof as well, such that adding additional parameters will not break the API every time.

I have been following this pattern for the Node.js client. E.g. see the ListOperation class for an example.

jonas32 commented 4 years ago

Good idea. I also didnt like the idea of passing the empty slice every time. I will try to build it like that and report back if it works. All CDT functions return the Operation struct. So for set_context, it is easy to do. Context is set in the Operation. Policy is set in the CdtOperation struct. In addition, policy is built into the args array together with other parameters that can not be identified once inside. It also looks like policy has a required fixed position in this array. Changing policy with the chain function would cause rebuilding this whole array. To be consistent with the normal Read/Write functions, i would probably build policy into the CDT functions, like its done for the normal basic read/write functions with WritePolicy and ReadPolicy.

jonas32 commented 4 years ago

Im currently trying to implement the HLL Operations. While doing that, i found out that HLLs are handled different than other returns. The Golang Client for example uses a HLLValue datatype. From my understanding, its just a Vec return and acts the same way as Value::Blob. I personally never used HLL in any way, so i dont understand how to implement it. Can you explain, why it needs to be handled different and whats the reason behind HLLValue?

jonas32 commented 4 years ago

Im done implementing everything besides HLL and the Wildcard/Inf stuff. Ill do HLL in a second PR. I found some caching technique in the golang client for the encoded ops, but that will also be part of a second PR. Some parts are breaking changes, but i think they should be or cant be done different. For example Policies on Lists. Using a set_policy method would not be consistent to Map ops or the normal Operations. From my side, this is ready for review/merge.

jhecking commented 4 years ago

Policy is set in the CdtOperation struct. In addition, policy is built into the args array together with other parameters that can not be identified once inside. It also looks like policy has a required fixed position in this array. Changing policy with the chain function would cause rebuilding this whole array. To be consistent with the normal Read/Write functions, i would probably build policy into the CDT functions, like its done for the normal basic read/write functions with WritePolicy and ReadPolicy.

I see your point. Would take quite a bit of refactoring to decouple the Operation/CdtOperation structs from the final on-wire protocol layout; might hurt performance as well. I'm fine with your current implementation.

jhecking commented 4 years ago

While doing that, i found out that HLLs are handled different than other returns. The Golang Client for example uses a HLLValue datatype. From my understanding, its just a Vec return and acts the same way as Value::Blob. I personally never used HLL in any way, so i dont understand how to implement it. Can you explain, why it needs to be handled different and whats the reason behind HLLValue?

Will have to look into that, as I'm not familiar with the Go client implementation.

jhecking commented 4 years ago

What I've seen so far looks great! But I need some more time for a full review, as this is a quite large PR (3000+ lines added). Please bear with me.

jonas32 commented 4 years ago

No problem, i already expected that to be a longer review process. For the HLL implementation: You dont have to check it in the golang one. I just used this one as reference, because its the easiest for me to port to rust. Java or C client as reference is also fine. The easiest would be nodejs for me, but this client seems to use a lot of FFI bindings to the C lib.

jhecking commented 4 years ago

The easiest would be nodejs for me, but this client seems to use a lot of FFI bindings to the C lib.

Yes, the Node.js client – which I also maintain – is just a wrapper around the C client. And the C client does not have a separate type for HLL values. I've reached out to @BrianNichols, who maintains both the Java and C clients, to hopefully shed some light why some clients have a separate type and others don't.

jhecking commented 4 years ago

Here's what Brian had to say on the topic of the HLLValue type:

The java client does not include the particle type when returning values to the user. If HLLValue did not exist, a byte[] would have been returned instead. The problem is when the user tries to use the byte[] as input to another command. Value.get(Object value) would not know that the byte[] is really an HLL particle type. HLLValue's real purpose is to identify the bytes as HLL bytes. GeoJSONValue exists for the same reason to distinguish GeoJSON strings from regular strings.

The C client returns as_bytes which includes the particle type, so HLLValue is not needed. If I were able to redesign the java client today, I would return the particle type with each value (or just return Value objects instead of raw objects).

For the Rust client, that leaves us with two options: We could either extend the existing Value::Blob type to include the particle type, e.g. ParticleType::BLOB, ParticleType::HLL, etc. Or we could add a separate Value::HLL type. I think I would opt for the latter, as it's more explicit.

jhecking commented 4 years ago

Context is set in the Operation.

Actually, why is the CDT context part of the Operation struct and not part of the CdtOperation struct? It's the CdtOperation::estimate_size and CdtOperation::write_to functions that actually handle the context.

Policy is set in the CdtOperation struct. In addition, policy is built into the args array together with other parameters that can not be identified once inside. It also looks like policy has a required fixed position in this array. Changing policy with the chain function would cause rebuilding this whole array.

The "problem" here is that the client's data structures too closely model after the wire protocol. For a more ergonomic client API it would be better if the Operation/CdtOperation structs were modelled at a slightly higher abstraction level, and then converted to wire format at the last moment, when serializing to the buffer. But I'm not sure what the performance impact would be, if any. Also, that would off course be a pretty major rewrite of the entire operations code at this point. :-) Not sure that's worth it. Esp. since close alignment between the different Aerospike client APIs also has some value.

jonas32 commented 4 years ago

Actually, why is the CDT context part of the Operation struct and not part of the CdtOperation struct? It's the CdtOperation::estimate_size and CdtOperation::write_to functions that actually handle the context.

It has two reasons. First, i was not sure if it only works with CDT Operations or also with for example the scalar operations on lists. Second, if i would put it in the CdtOperation itself, OperationData and the CdtOperation itself would always have to be mutable. The set_context function would still have to stay in the mod.rs, because its required for chaining on operation functions. Otherwise you would have to do it on the OperationData by matching for the right CdtType. Thats why i set it on the Operation and pass it to the CdtOperation when building the buffers.

The "problem" here is that the client's data structures too closely model after the wire protocol. For a more ergonomic client API it would be better if the Operation/CdtOperation structs were modelled at a slightly higher abstraction level, and then converted to wire format at the last moment, when serializing to the buffer. But I'm not sure what the performance impact would be, if any. Also, that would off course be a pretty major rewrite of the entire operations code at this point. :-) Not sure that's worth it. Esp. since close alignment between the different Aerospike client APIs also has some value.

Thats i guess what this caching mechanism does in the Golang client. It builds the buffer at first execution and then saves it to the Operation for later use. I think rewriting the operations to match the other libs would be nice, but definitely a full breaking change. When we put that much effort into the operations to improve them, we should also improve the parts that could break stuff. (Policies, Contexts, Names of functions etc.)

If a full breaking change is fine, i can give it a try. Im not sure how time consuming that will be, so it might take a while. When refactoring that much code, we should probably also talk about what else needs to be done for a 1.0 release.

jonas32 commented 4 years ago

For the Rust client, that leaves us with two options: We could either extend the existing Value::Blob type to include the particle type, e.g. ParticleType::BLOB, ParticleType::HLL, etc. Or we could add a separate Value::HLL type. I think I would opt for the latter, as it's more explicit.

I think it would probably be easier and more straightforward to do it with Value::HLL, but i never used it so im ok with both options.

jonas32 commented 4 years ago

I think it would be better to add support for predicate expressions in a separate PR and to just keep this PR focused on map/list/bitmap operations. As such, I have not yet reviewed the changes you did to e.g. the scan policy, and other changes purely needed to support predicate expressions.

i added it here, because i found them missing while changing the encoder. I didnt want to open a whole new PR for this few lines but I can remove them here and move them to an own one.

jonas32 commented 4 years ago

PredExp is removed from Policy and Scan again. I will not make a new PR for Scan and Policies again. Would have to be removed with the next version anyways. All quersions/changes should be done now.

jhecking commented 4 years ago

Do you mind pulling from the aerospike-client-rust:update-operations branch? I've fixed a few minor lint warnings and added/updated the copyright statements in all modified files. But I don't have permission to push to your is-it-fresh:update-operations branch directly.

Once that's done, this is ready to merge, right?

jonas32 commented 4 years ago

Yep, looks like this is done.

rbotzer commented 4 years ago

Thank you @jonas32

aerospike / aerospike-client-rust

Update operations #79