erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.84k stars 726 forks source link

ivf-hnsw #109

Open godofdream opened 5 years ago

godofdream commented 5 years ago

I stumbled over https://github.com/dbaranchuk/ivf-hnsw

Did anyone tried to benchmark their implementation? Is there conzeptual difference between ivf-hnsw and hnsw from hnswlib or nswlib?

maumueller commented 5 years ago

I suppose @yurymalkov can say more about it. :-)

yurymalkov commented 5 years ago

@godofdream, Yes, there is a conceptual difference. ivf-hnsw uses hnswlib as a part and it targets billion scale scenario with low memory footprint (e.g. 1B index in 20 Gb RAM). So, it should not be as fast as hnswlib, but it is much more memory efficient. Also, @dbaranchuk is the right person to ask about it. :)

erikbern commented 5 years ago

great if anyone wants to add it!

prateekpatelsc commented 4 years ago

@yurymalkov : any comments on the quality of the results if the index could be possibly stored in hnswlib as well

yurymalkov commented 4 years ago

@prateekpatelsc AFAIK it is using a fork of hnswlib, which should have a similar performance (although I haven't tested).

prateekpatelsc commented 4 years ago

@yurymalkov : sorry may be my question was not clear. I meant to ask that hnsw-ivf uses quantizers (from a quick look at code), i was curious if that causes any lower recall quality for example for a dataset of size 1 million (so could have used either hnswlib or hsnw-ivf index approach). I was more curious about how much perf degrades in terms of quality and latency/throughput if i decide to use hnsw-ivf as opposed to hnswlib .

Also i do see https://arxiv.org/pdf/1812.09162.pdf , which tries to further reduce memory footprints and optimize on performance on top of hnsw-ivf indexes, wondering if you had a chance to look at this as well ?