apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

Making NO_INTERVALS to be used by clients of Lucene #13385

Closed expani closed 1 month ago

expani commented 1 month ago

Description

A bug was reported in OpenSearch which caused Interval Queries containing sub-query in rules to fail with Sub-iterators of ConjunctionDISI are not on the same document if an no_match query was generated after going through the search analyser.

A similar issue was found in Lucene a few years back and was fixed by this PR

I am proposing to make NO_INTERVALS present in Lucene as public so it can be used by it's clients like OpenSearch rather than creating a clone of the same.

jainankitk commented 1 month ago

Makes sense for IntervalBuilder clients to use the existing IntervalsSource implementation. Also, safe to make public given the variable is static final. Approved!

romseygeek commented 1 month ago

LGTM. Can you add an entry to CHANGES.txt?

expani commented 1 month ago

@romseygeek Should I add only under Lucene 10.0.0 or other versions as well ?

romseygeek commented 1 month ago

We can backport this to 9x so add the entry under the latest 9x version.

expani commented 1 month ago

@jainankitk @romseygeek Requesting an approval.

uschindler commented 1 month ago

Should I open a new issue, or should we revert this one and give a better solution?

romseygeek commented 1 month ago

Let's open a new issue and we can discuss a bit more about when you need such a thing and where it should live.

uschindler commented 1 month ago

OK. Just for brevity: How does the one made public here differs from the already existing class (except that one returns null iterator while the other one returns an empty iterator)?

uschindler commented 1 month ago

See #13388

romseygeek commented 1 month ago

From a quick look we can indeed replace everything with NoMatchIntervalIterator. I think there are some other places that we can make use of the null shortcut as well - for example, DisjunctionIntervalsSource.create() can return null if its input collection is empty.