erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.73k stars 715 forks source link

Upgrade elastiknn to 8.12.2.1 #503

Closed alexklibisz closed 3 months ago

alexklibisz commented 3 months ago

Upgrading to the latest release which has some performance improvements. Still fine to keep this in disabled state as it's still slower than many alternatives.

I re-ran the containerized benchmark on an r6i.4xlarge and got these results:

Model Parameters Recall Queries per Second
eknn-l2lsh L=100 k=4 w=1024 candidates=500 probes=0 0.378 314.650
eknn-l2lsh L=100 k=4 w=1024 candidates=1000 probes=0 0.446 247.659
eknn-l2lsh L=100 k=4 w=1024 candidates=500 probes=3 0.634 258.834
eknn-l2lsh L=100 k=4 w=1024 candidates=1000 probes=3 0.716 210.380
eknn-l2lsh L=100 k=4 w=2048 candidates=500 probes=0 0.767 271.442
eknn-l2lsh L=100 k=4 w=2048 candidates=1000 probes=0 0.846 221.127
eknn-l2lsh L=100 k=4 w=2048 candidates=500 probes=3 0.921 199.353
eknn-l2lsh L=100 k=4 w=2048 candidates=1000 probes=3 0.960 171.614

This is about 20% worse than the non-containerized benchmarks running on the same instance, reported here: https://github.com/alexklibisz/elastiknn/commit/ddf637ae7053cf8f6dc038b4876520f3e41c0673. Not quite sure why the difference, but I don't have time to diagnose right now. If I had to speculate it's probably because the containerized version runs both ann-benchmarks and the elasticsearch server on one CPU, whereas non-containerized runs elastiknn container on one cpu and ann-benchmarks on the host.