alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
https://alexklibisz.github.io/elastiknn
Apache License 2.0
371 stars 48 forks source link

Try using Lucene IntIntHashMap to speedup and reduce memory usage of top-K counting #662

Open alexklibisz opened 7 months ago

alexklibisz commented 7 months ago

Background

Breaking this out of https://github.com/alexklibisz/elastiknn/issues/160, specifically this comment: https://github.com/alexklibisz/elastiknn/issues/160#issuecomment-1826481362

I'd like to try using the IntIntHashMap from Lucene 9.x to decrease latency and memory usage for counting the top-K hits in LSH queries.

Deliverables

Related Issues

So far it looks like the speedup is only noticeable in larger datasets. So this might require expanding benchmarking to datasets larger than Fashion Mnist.

alexklibisz commented 2 months ago

There are also some other interesting classes in Lucene that seem related, e.g., RoaringDocIdSets.