filodb / FiloDB

Distributed Prometheus time series database
Apache License 2.0
1.43k stars 225 forks source link

fix(core): Improve performance for Tantivy indexValues call #1867

Closed rfairfax closed 1 week ago

rfairfax commented 1 week ago

indexValues was falling way behind Lucene due to a few reasons:

  1. We were copying results directly into Java objects, which was incurring a lot of JNI back and forth overhead
  2. When querying the entire index we were looking at docs instead of the reverse index, which increased the count of items to process

This PR does a few things:

  1. Add perf benchmarks for the missing functions
  2. Add a new IndexCollector trait that can be used to walk the index vs docs
  3. Remove the JNI object usage in indexValues vs byte serialized data
  4. Return encoded string arrays instead of creating JVM strings in native code
  5. Glue all these optimizations togther.

With this Tantivy is still a bit behind Lucene for this path, but it's almost 100x faster than before.

Pull Request checklist