alexklibisz / elastiknn

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
https://alexklibisz.github.io/elastiknn
Apache License 2.0
368 stars 48 forks source link

Try using a byte array in ArrayHitCounter instead of a short array #613

Open alexklibisz opened 9 months ago

alexklibisz commented 9 months ago

Background

ArrayHitCounter uses an array of shorts to count hits. It's not a very memory-efficient implementation, as it requires an array entry for every document in the segment. So it uses shorts because a short requires half the memory of an int, and counts should rarely exceed the max value of a short.

I think an array of bytes would also work, and would require half the memory. This could be implemented as a new implementation of the HitCounter interface: rename the current one to ShortArrayHitCounter and add a new one ByteArrayHitCounter. The max value that fits in a byte is 256. So if the number of hashes passed to MatchHashesAndScoreQuery is <= 256, it uses the ByteArrayHitCounter, else it uses the ShortArrayHitCounter.

Bard already wrote most of it for me:

image image

Deliverables

Related Issues

611