blevesearch / bleve

A modern text/numeric/geo-spatial/vector indexing library for go
Apache License 2.0
10k stars 678 forks source link

optimize scorch conjunctions via bitmap OR'ing when scoring & locations aren't needed #1109

Open steveyen opened 5 years ago

steveyen commented 5 years ago

bleve's scorch currently has an optimization when processing disjunction searches (AND's), by leveraging roaring-bitmap's And() API (see: https://github.com/blevesearch/bleve/blob/master/index/scorch/optimize.go#L72)

But, there's no optimization yet for conjunction searches in scorch -- which might be possible by leveraging roaring bitmap's Or() API.

In particular, if a search request doesn't need locations (IncludeTermVectors is false) and doesn't need scoring or freq-norm data (note: bleve doesn't currently have an API to specify that scoring isn't needed -- we would need to add such an API)...

...then, OR'ing together multiple roaring bitmaps might work where the resulting bitmap could be used as a so-called "actual" bitmap (in the parlance of the "actual" bitmap versus "all" bitmap in scorch's zap segment PostingsIterator). Because locations and freq-norm lookups would be skipped, we wouldn't need an "actual" bitmap in this case and would be able to avoid any maintenance of any freq-norm readers and location readers for this edge case.

Additionally, perhaps some heuristic checks might also be needed to check if the resulting OR'ed bitmap goes over some size limit (don't want to blow out memory usage too badly) as part of this?

steveyen commented 5 years ago

related PR... https://github.com/blevesearch/bleve/pull/1110