fix(core): Improve performance for Tantivy indexValues call - Githubissues

filodb / FiloDB

Distributed Prometheus time series database

Apache License 2.0

1.43k stars 225 forks source link

fix(core): Improve performance for Tantivy indexValues call #1867

Closed rfairfax closed 1 week ago

rfairfax commented 1 week ago

indexValues was falling way behind Lucene due to a few reasons:

We were copying results directly into Java objects, which was incurring a lot of JNI back and forth overhead
When querying the entire index we were looking at docs instead of the reverse index, which increased the count of items to process

This PR does a few things:

Add perf benchmarks for the missing functions
Add a new IndexCollector trait that can be used to walk the index vs docs
Remove the JNI object usage in indexValues vs byte serialized data
Return encoded string arrays instead of creating JVM strings in native code
Glue all these optimizations togther.

With this Tantivy is still a bit behind Lucene for this path, but it's almost 100x faster than before.

Pull Request checklist

[X] The commit(s) message(s) follows the contribution guidelines ?
[X] Tests for the changes have been added (for bug fixes / features) ?