apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

Rewrite LongRange.ValueSourceQuery/MultiValueSourceQuery to FieldExistsQuery on max range #13383

Open timgrein opened 1 month ago

timgrein commented 1 month ago

Related to https://github.com/apache/lucene/issues/13375

Description

With this change ValueSourceQuery and MultiValueSourceQuery will be rewritten to a FieldExistsQuery, if the query range uses the maximum range possible ([Long.MIN_VALUE, Long.MAX_VALUE]) as the exact values of the queried field won't matter anymore as every value lies within the maximum possible range. If the query range uses exclusive boundaries the query won't be rewritten as a field could contain a value being Long.MIN_VALUE or Long.MAX_VALUE which wouldn't match anymore.

I've made LongValuesSource.FieldValuesSource and MultiLongValuesSource.FieldMultiValueSource public and added a corresponding getField accessor which will be needed to check, whether the value source is actually relying on the specific field.

I'll plan to open a separate PR for DoubleRange performing a similar optimization.

msokolov commented 1 month ago

I don't really like that we have to expose these internal classes purely to do class casts. Also, this only covers the LongValuesSource case but wouldn't we want to cover other numeric types as well? I wonder if we could add some kind of check ValuesSource.minValue/maxValue or ValuesSource.isRangeComplete or so that could be used more generically?

msokolov commented 1 month ago

But TBH this is probably not worth doing if it requires any major changes since users can very easily handle this in their query-creation logic

timgrein commented 1 month ago

I don't really like that we have to expose these internal classes purely to do class casts.

I kinda oriented myself at LongValueSource.ConstantLongValuesSource, which is also public and used in a similar way here, but I'll check, if there's a better way without making the classes public.

I wonder if we could add some kind of check ValuesSource.minValue/maxValue or ValuesSource.isRangeComplete or so that could be used more generically

I think ValueSource is extended by LongFieldSource, which seems to have similar semantics as FieldValueSource inside LongValueSource, but it's not used inside LongRange ... 🤔

github-actions[bot] commented 3 weeks ago

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!