Performance benchmarking

prabhatsharma commented 2 years ago

I am looking to index system logs ranging in multiples of gigagbytes. Going by the performance benchmarks compared to other libraries for bleve at https://mosuka.github.io/search-benchmark-game/ , can we expect the performance of bluge to be comparable to bleve?

mschoch commented 2 years ago

First, yes absolutely Bluge should perform the same or better than Bleve v2 (upon which it is largely based). There are 2 reasons why today it may not:

An important optimization for non-scoring queries was removed from Bluge. The reason is that I think Bleve got the design wrong, and I'd like to improve that in Bluge.
Bleve/Bluge search has many tight loops, and it's very likely we introduce some unintentional perf regressions along the way while refactoring the code. These tight loops have the effect magnifying a small change to produce large effects.

Regarding search-benchmark-game, I love the idea, and I've spend weeks of time working with it. Unfortunately, I came to the conclusion that the design of search-benchmark-game is problematic for comparing Bleve and Bluge. The trouble I ran into is that it runs a mixed-workload of different query types in the same process. It then attempts to measure them independently and present the results. Unfortunately, because Go is garbage collected, there are often cases where the runtime may be performing work related to previous queries. I was able to modify the way I ran search-benchmark-game, to focus on single queries and while that helped significantly, we still were unable to get consistent reproducible results. (NOTE: running mixed query loads is also makes it harder to interpret pprof output collected as well)

As, we are going to need some sort of new perf test framework to validate this Bluge vs Bleve comparison, I think it will certainly be inspired by search-benchmark-game (and use the same dataset). But, I suspect it will be more specific to the Bluge vs Bleve comparison to simplify things.

prabhatsharma commented 2 years ago

Any specific reason(s) you think bleve/bluge does not perform as well as lucene (java - garbage collected language) in the benchmark considering it is following similar approach to indexing.

mschoch commented 2 years ago

Yeah you're right, Java is also garbage collected, so perhaps my explanation is wrong or too simplistic. But at a minimum search-benchmark-game would benefit from some variance/stddev metric to give a sense of whether or not the measurements had converged to something meaningful. When search-benchmark-game first reported that Bleve v2 was slower than v1 I spent a week trying to answer why, and eventually gave up. When I reduced the workload to a single query at a time, I would often see the results reverse with v2 being faster. But the whole thing was inconsistent, and we never were able to reliably separate the signal from the noise. I documented some of approach/findings here: https://github.com/blevesearch/bleve/issues/1550

Basically, up till now, we've been focused on making the indexing time and final index size comparable to Lucene. And while we're still not all the way there, we're close enough now to expect that our index size is no longer the only explanation for poor search performance. Obviously additional research into this is needed.

Just to be clear, for a Bluge v1 release, I'd like Bluge to perform as well as Bleve v2. However, matching Lucene's performance is a long-term goal that we will continue to work on.

blugelabs / bluge

Performance benchmarking #75