Closed orrorcol closed 7 months ago
When redoing this benchmark, the HNSW is faster than the baseline kmeans but not by the factor of 8.2 indicated in the wiki.
matthijs@devfair0144:~/faiss_versions/issue_2159$ python bench_hnsw.py 100 kmeans
load data
Performing kmeans on sift1M database vectors (baseline)
Clustering 1000000 points in 128D to 16384 clusters, redo 1 times, 10 iterations
Preprocessing in 0.09 s
Iteration 9 (52.87 s, search 52.37 s): objective=3.85085e+10 imbalance=1.227 nsplit=0
matthijs@devfair0144:~/faiss_versions/issue_2159$ python bench_hnsw.py 100 kmeans_hnsw
load data
Performing kmeans on sift1M using HNSW assignment
Clustering 1000000 points in 128D to 16384 clusters, redo 1 times, 10 iterations
Preprocessing in 0.08 s
Iteration 9 (40.46 s, search 39.64 s): objective=3.85109e+10 imbalance=1.227 nsplit=2
Concerning your report, it may be due to installing via pip, the only supported Faiss installation method is with conda.
Summary
Platform
OS: linux 3.10
Faiss version: 1.7.1.post3 Installed from:
pip install faiss-cpu
Faiss compilation options:
Running on: 8 core AMD CPU
Interface:
I run kmeans using bench_hnsw.py, and I get the following result:
The result shows that kmeans_hnsw is slower. However, the result in wiki shows that kmeans_hnsw should be much faster: wiki address: https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors