Open asfimport opened 6 years ago
Adrien Grand (@jpountz) (migrated from JIRA)
Do you have concrete ideas in mind? I see BitSetIterator as an exception, other Bits-based iterators don't perform well and are better implemented with two-phase iterators?
David Smiley (@dsmiley) (migrated from JIRA)
First a disclaimer: I was merely looking at the code; this wasn't driven by any debugging session or profiling.
I see BitSetIterator as an exception, other Bits-based iterators don't perform well
I'm confused. Why would BitSetIterator specifically be better than some other hypothetical Bits-based iterator? A hypothetical Bits-based iterator of course would be based on some fast bit set, though through API reasons isn't necessarily a BitSetIterator specifically.
Bits-based iterators don't perform well and are better implemented with two-phase iterators
That surprises me; I would imagine it would depend on the cardinality of the bits since a low-cardinality bit set ought to potentially lead iteration? For an extremely dense case; I'm not sure how much it'd matter either way since they are cheap to work with.
Adrien Grand (@jpountz) (migrated from JIRA)
Sorry for introducing some confusion, when I mentioned Bits-based iterators I was thinking of a DocIdSetIterator over a Bits instance that would check bits one by one to find the next match rather than an alternative to BitsetIterator.
David Smiley (@dsmiley) (migrated from JIRA)
I think I see what you're getting at due to my first mention of Bits vs BitSet; maybe we confused each other ;)
I'm pointing that ConjunctionDISI makes a special optimization for any of it's input DISI's of type BitSetIterator, and I think that's a shame since someone might have a similar DISI that is not necessarily a BitSetIterator precisely. From that observation, I hypothesized if a DISI might expose an optional Bits somehow, then ConjunctionDISI could do it's optimization more generically.
I was looking at ConjunctionDISI and noted the special case logic for DISI's of type BitSetIterator. It seems to only need the more minimal Bits interface though it makes references to BitSet specifically. BitSetIterator is a concrete class; it would be nice if a DISI could either implement an optional interface to expose a Bits or perhaps implements Bits directly. This would allow other/custom DISIs that can implement a Bits quickly without being forced to use BitSetIterator specifically. Even DocIdSetIterator.all(...) could implement this.
Migrated from LUCENE-8386 by David Smiley (@dsmiley), updated Jul 09 2018