blevesearch / zapx

Zap file format compatible with a future version of Bleve
Apache License 2.0
11 stars 12 forks source link

MB-60943 - Add a coarse quantiser to the IVF indexes #225

Closed metonymic-smokey closed 6 months ago

metonymic-smokey commented 7 months ago

This PR adds a HNSW index as the coarse quantiser for IVF indexes - avoid brute force search for the centroids closest to the query vector and instead use a HNSW centroid index .

Verified this to not impact recall significantly in E2E testing -

  1. 10M dataset - 768 dims Case | Recall | Accuracy without change | 0.85795 | 0.897 with change | 0.8565999999999999 | 0.895

  2. 5M dataset - 1536 dims Case | Recall | Accuracy without change | 0.9591 | 0.979 with change | 0.95811 | 0.981

metonymic-smokey commented 6 months ago

@abhinavdangeti updated it, pls let me know if you've any further comments, thanks!