Open gautamworah96 opened 1 month ago
That sounds interesting, I can take a look at DoubleRange
first and do so some benchmarking, shouldn't be too hard to expand it then to the other data types, if it shows promising results.
I can take a look at DoubleRange first and do so some benchmarking
I suspect that existing benchmarks won't show needle moving results if they don't have enough queries that can be optimized through this opto. The change is still a positive one tho. We may need to benchmark it on a query set that specifically has a lot of these "full range from 0->Long.MAX_VALUE" clauses.
I suspect that existing benchmarks won't show needle moving results if they don't have enough queries that can be optimized through this opto.
Yes, I'm gathering some specific numbers for this exact case. I'll also suspect that this rewriting should've a purely positive effect, especially considering that you've a max of 4
dimensions for FloatRange
, DoubleRange
etc., which should make detecting a query containing such a range very fast.
Is it valid to rewrite to match all docs though? Some docs may lack a value, and I didn't think we would match in that case.
I took a first pass at this and also noticed that MatchAllDocsQuery
is semantically wrong as some docs might not have the field (also caught by the existing test cases). What I think could work is to check, whether the valueSource
is of type FieldValuesSource
and rewrite the query to a FieldExistsQuery
if we encounter the maximum possible range (Long.MIN_VALUE
and Long.MAX_VALUE
) at the same time.
WDYT @msokolov ?
+1, that should work.
@msokolov Opened a PR for LongRange
, if you want to take a look (I cannot request a reviewer...): https://github.com/apache/lucene/pull/13383
Description
At Amazon Search, we came across a lot of client queries that were specifying the min value as 0, and the max value as Long.MAX_VALUE.
We don't use the associated ValueSourceQuery query internally, but use a custom logic for filtering. An improvement here would be to rewrite ValueSourceQuery to a MatchAllDocsQuery if we detect that the min and max value are the same as the min and max value of Long.
I think we will have to check for something like (
minVal==0
andmaxVal==2^64-1
) or (minVal==Long.MIN_VALUE
andmaxVal==Long.MAX_VALUE
) since we will have to cover both signed and unsigned cases.We can implement similar logic for DoubleRange as well.