Open steveyen opened 5 years ago
Latest thinking / notes on this, from digging around...
https://github.com/blevesearch/bleve/blob/master/search/scorer/scorer_conjunction.go#L53
Perhaps the non-scoring conjunction of term-searchers is just an edge case of two more slightly general things...
it's either a constant scoring conjunction, where the constant happens to be 0.
or, it's a (new concept here) "uniform scoring conjunction", where all the children contribute the same exact score to the parent conjunction. So the conjunction's score ends up being a constant factor times the number of contributing children. And, in a conjunction, the number of contributing children is fixed for the lifetime of the search. For the edge case of the non-scoring conjunction, the constant factor or child-multiplier would be 0.
After more thoughts, looks like it (e.g., a "uniform scoring conjunction") is not going to work without some significant API "plumbing" changes or would only work in a narrow edge case of "keyword" type of analyzers.
Imagine a field had text like... "aa aa bb cc"
Then, when there's a conjunction search like, "+aa +bb +cc", the multiple roaring bitmaps for those terms can by AND'ed together (the conjunction optimization).
But, while Next()'ing through iterators, the "aa" term would have frequency 2, and "bb and "cc" would have frequency 1. So, a simple "uniform" scoring factor or multiplier wouldn't work in this case.
And, the existing TermFieldReader interface doesn't have enough way to pass that kind of signal upstairs -- it might have to be some new kind of interface like a MultiTermFieldReader, which feels like going too far.
Based on that, I think I'll push this chain of thought to the backburner.
Tracking an idea here before I forget it. Based on some of the optimization proposals happening over at https://github.com/blevesearch/bleve/pull/1110, I was thinking or wondering if there might be a way to optimize the handling of with-scoring, locations-included conjunctions of scorch term searchers.
Currently, in the classic searcher-conjunction handling codepaths, each of the leaf scorch term searchers have their own roaring iterator instances, which they separately Next() through. That is, if there are N term-searchers in a conjunction, there will be N roaring iterators, with N calls to roaring-iterator.Next() when parent conjunction Next() moves forward one hit.
Instead, we might be able to share a single roaring iterator here for all the N child term-searchers, since these optimized term searcher all will be sharing the same, computed, AND'ed-together "actual" bitmap instance.
Although that'll help for the "actual" side of the house, there would still need to be N roaring-iterators, though, for the "all" bitmaps that are needed to maintain the freq-norm and term-vector readers.
It might also be possible that this idea of a +scoring/+location conjunction optimization might be unify-able with the proposed "unadorned" (-scoring/-location) conjunction codepaths in PR #1110 , but haven't thought that far yet.