aerospike / aerospike-client-rust

Rust client for the Aerospike database
https://www.aerospike.com/
Other
82 stars 26 forks source link

FilterExpression in Query policy #112

Closed bmuddha closed 2 years ago

bmuddha commented 2 years ago

First of all, thanks for amazing work done with this library.

I came across an optional FilterExpression in QueryPolicy, https://github.com/aerospike/aerospike-client-rust/blob/6ac1a06a899d964bf3564e5b9fad746ba9c22411/src/policy/query_policy.rs#L40

So I was wondering about few things:

  1. Does the filter expression affect all subsequent queries once set?
  2. Is it a good idea to use it in order to filter records from large dataset, do those filters apply before secondary index based filtering or after?
  3. Is it better to use UDF, if you need to perform filtering on multiple fields, as currently only one sindex based filter is allowed for query?

Thanks in advance, cheers!

kportertx commented 2 years ago
  1. I'd assume not but I don't know this library well enough to be certain.
  2. These filters are applied after the sindex filtering (when a sindex is used). They should also be fine to use on a large dataset - especially if you are using record metadata in the filter as such filters may avoid the disk access to fetch the record.
  3. I'm not sure how you would use a UDF to filter. You can filter on multiple criteria with a single filter by composing them using the and and or expressions.

More information about expressions, in general, can be found here: https://docs.aerospike.com/guide/expressions

jonas32 commented 2 years ago

The filter expression is applied to a specific QueryPolicy instance. It only affects queries that are executed with this exact instance. New QueryPolicy instances will always be created with an empty filter expression.

let mut qpolicy_filtered = QueryPolicy::default();
qpolicy_filtered.filter_expression = Some(YOUR_FILTER);

let statement = Statement::new("namespace", "set", Bins::All);

let res1 = client.query(&qpolicy_filtered, statement); // will apply the filter
let res2 = client.query(&QueryPolicy::default(), statement); // will not apply the filter
let res3 = client.query(&qpolicy_filtered, statement); // will apply the filter again
bmuddha commented 2 years ago
  1. I'd assume not but I don't know this library well enough to be certain.

    1. These filters are applied after the sindex filtering (when a sindex is used). They should also be fine to use on a large dataset - especially if you are using record metadata in the filter as such filters may avoid the disk access to fetch the record.

    2. I'm not sure how you would use a UDF to filter. You can filter on multiple criteria with a single filter by composing them using the and and or expressions.

More information about expressions, in general, can be found here: https://docs.aerospike.com/guide/expressions

@kportertx thanks for the reply, as for the third point, well, I am also not sure how to filter with UDF but in docs it's said to be possible, but anyway, currently aerospike supports only one sindex based filter in query, so I was a bit confused where the rest of the filters supposed to go, as this library seemingly allows to provide only single filter when making query (you can add more than one, but server will complain).

https://github.com/aerospike/aerospike-client-rust/blob/6ac1a06a899d964bf3564e5b9fad746ba9c22411/src/query/filter.rs#L20-L44

So, I'm even more confused now, whether one can specify a set of filters just in QueryPolicy (including secondary index based ones), or should you use Statement::add_filter method to specify sindex based filters. Any further clarifications are mostly appreciated.

jonas32 commented 2 years ago

The UDF Support in this client is not complete. Look here: https://github.com/aerospike/aerospike-client-rust/issues/6 Probably UDFs are not the right choice for simple filtering. Filter Expressions should be able to do what you need. https://github.com/aerospike/aerospike-client-rust/tree/master/src/expressions They are more modern and able to do a lot more. I think the normal add_filter function is more or less outdated. Here is a simple example of a filter expression with multiple filters (bin1 == 25 && bin2 > 100):

use aerospike::expressions;
let filter = expressions::and(vec![
    expressions::eq(
        expressions::int_bin("bin1".to_string()), 
        expressions::int_val(25)
    ), 
    expressions::gt(
        expressions::int_bin("bin2".to_string()), 
        expressions::int_val(100)
    )
]);

In case you need more examples of the usage of filter expressions, check the test files. I also used the query function there. https://github.com/aerospike/aerospike-client-rust/blob/master/tests/src/exp.rs#L631 You can see the actual filter expressions above this line.

kportertx commented 2 years ago

The udf docs page that you've linked to discusses filtering with "predicate filters" (which has been requested to be revised to filter-expressions). Otherwise, I don't see anywhere that would indicate that UDFs would do the filter.

There are two types of "filters" this Filter specifies the set, bin, and bin type and is used to select the appropriate secondary index. Currently, and as you have found, the server only allows you to select one secondary index. The second type of filter is the "filter_expression" which is in the Query policy that @jonas32 provided an example of above. The "filter_expression" is the type of "filter" I was describing - sorry for the confusion.

bmuddha commented 2 years ago

Oh, I see, it's clear now and makes sense. Thank you guys for quick replies and thorough explanations :+1: