apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

Add Intervals.noIntervals() method #13389

Closed romseygeek closed 1 month ago

uschindler commented 1 month ago

Just some tidy missing. 😜

expani commented 1 month ago

To support both behaviours, we can create a new constructor forNoMatchIntervalsSource which returns NO_INTERVAL style iterator instead of null to keep this change safe.

The new API can use this constructor instead of existing one.

romseygeek commented 1 month ago

Returning null is always OK here - if you look at ConjunctionIntervalsSource or DisjunctionIntervalsSource, the sub-sources can always return null to indicate that there are no intervals within a given segment.

uschindler commented 1 month ago
  • Lot of interval functions ( around 21 usage ) indirectly can end up using the null iterator which expected NO_INTERVAL before. There are other places as well which don't seem to extensively covered by UTs.
  • Existing users of Intervals.atLeast expect a null iterator to be returned for some cases.

To support both behaviours, we can create a new constructor forNoMatchIntervalsSource which returns NO_INTERVAL style iterator instead of null to keep this change safe.

The new API can use this constructor instead of existing one.

No need for separate variants. The interval query parser just constructs queries, it never executes them.

We only have to check if the interval code works correct with null iterator.

Users of Lucene are common to null iterators. They are used everywhere (scorer, doc id sets,...). Returning null is much easier than crazy logic with -1 and NO_MORE_DOCS.

expani commented 1 month ago

My main concern was the public API Intervals#analyzedText

Clients of lucene were getting a non-null iterator till now and will receive a null iterator after this change.

Maybe not backport to 9.x as there is already an alternative of Intervals.atLeast(1) ?

uschindler commented 1 month ago

Why should anybody call that method. It's internal and not for public use?

Those type of changes are common in Lucene. We change behavior of methods unless it's documented to behave exactly as described.

expani commented 1 month ago

It's internal and not for public use?

It's marked public as of today and accessible the same way as Intervals#atLeast

uschindler commented 1 month ago

It's internal and not for public use?

It's marked public as of today and accessible the same way as Intervals#atLeast

This method does not return null.

That the implementation of the iterator method of IntervalSource returns null is not an issue (see comments above). The method is only intended to be called by Lucene code.

expani commented 1 month ago

The method is only intended to be called by Lucene code.

Then it should not be a public API.

Agree with comment by @romseygeek that since DisjunctionIntervalsSource returns null here when there are no sub-iterators, callers would already be performing null checks.

So, this change is not backward incompatible.

uschindler commented 1 month ago

The method is only intended to be called by Lucene code.

Then it should not be a public API.

Agree with comment by @romseygeek that since DisjunctionIntervalsSource returns null here when there are no sub-iterators, callers would already be performing null checks.

So, this change is not backward incompatible.

Could you please stop arguing about intervals() returning null? It is documented here: https://github.com/apache/lucene/blob/22d50be2eab84c9a75ea65f55fcc4356724faba1/lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalsSource.java#L35-L45

So whenever you call intervals() you must be prepared that it returns null (if no intervals match/found). So there is no backwards break. The class implementing that can return null or an empty iterator as it likes. The caller has to make sure that it can handle it. The class which implements the interface for the specific case you mention is private, so the only spec which is responsible for null/non-null is the above.

So please stop arguing. The change is backwards compatible. If there is code out there that stumbles on this, it was a bug from the beginning. Lucene has a quite relaxed backwards compatibility regulation as very strict ones stop invention of new features.

uschindler commented 1 month ago

Thanks!