alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
https://alexklibisz.github.io/elastiknn
Apache License 2.0
368 stars 48 forks source link

Performance: simplify and optimize kth-greatest computation (96% recall at 195 qps) #616

Closed alexklibisz closed 9 months ago

alexklibisz commented 9 months ago

Related Issue

611

Changes

This integrates the kth-greatest computation into the ArrayHitCounter, which gets rid of a loop for finding the min and max counts. While we're there, also removing the min count, since that is almost always zero and using an array of shorts for the histograms.

In aggregate, this gets the benchmarks up to 0.96 at 195 qps:

image

Though they did float around a bit (see the Benchmark commits).

Testing and Validation

Added more unit tests for the ArrayHitCounter.