facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.32k stars 3.63k forks source link

Fail to reproduce the result in wiki #2159

Closed orrorcol closed 7 months ago

orrorcol commented 2 years ago

Summary

Platform

OS: linux 3.10

Faiss version: 1.7.1.post3 Installed from: pip install faiss-cpu

Faiss compilation options:

Running on: 8 core AMD CPU

Interface:

I run kmeans using bench_hnsw.py, and I get the following result: image

The result shows that kmeans_hnsw is slower. However, the result in wiki shows that kmeans_hnsw should be much faster: image wiki address: https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors

mdouze commented 2 years ago

When redoing this benchmark, the HNSW is faster than the baseline kmeans but not by the factor of 8.2 indicated in the wiki.

matthijs@devfair0144:~/faiss_versions/issue_2159$ python bench_hnsw.py 100  kmeans
load data
Performing kmeans on sift1M database vectors (baseline)
Clustering 1000000 points in 128D to 16384 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.09 s
  Iteration 9 (52.87 s, search 52.37 s): objective=3.85085e+10 imbalance=1.227 nsplit=0
matthijs@devfair0144:~/faiss_versions/issue_2159$ python bench_hnsw.py 100  kmeans_hnsw
load data
Performing kmeans on sift1M using HNSW assignment
Clustering 1000000 points in 128D to 16384 clusters, redo 1 times, 10 iterations
  Preprocessing in 0.08 s
  Iteration 9 (40.46 s, search 39.64 s): objective=3.85109e+10 imbalance=1.227 nsplit=2

Concerning your report, it may be due to installing via pip, the only supported Faiss installation method is with conda.