facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.75k stars 3.66k forks source link

how to build hnsw faster #3316

Open HiCheems opened 8 months ago

HiCheems commented 8 months ago

Summary

Recently, I build an index with index type "IDMap,HNSW32,Flat". The dataset size is 8M and dimension is 200. I used a very lone time to build it, more than 48h. Is some parameter setting wrong?

Platform

OS: docker with 40 core

Installed from: pip install faiss-cpu

Running on:

Interface:

Reproduction instructions

import os os.environ["OMP_NUM_THREADS"] = "20" os.environ["OMP_WAIT_POLICY"] = "PASSIVE"

dimension = 200
index_type = "IDMap,HNSW16,Flat" metric_type = faiss.METRIC_INNER_PRODUCT index = faiss.index_factory(dimension,index_type,metric_type)

for i in range(8000000): embedding = np.random.rand(dimension).astype('float32') l2_norm = np.linalg.norm(embedding) normalized_embedding = embedding / l2_norm normalized_embedding = normalized_embedding.reshape(1, -1) index.add_with_ids(normalized_embedding, np.array([i]))

mdouze commented 8 months ago

Building an HNSW index is indeed slow, but 48h seems excessive. Could you try installing Faiss with conda?

HiCheems commented 8 months ago

Building an HNSW index is indeed slow, but 48h seems excessive. Could you try installing Faiss with conda?

It gets worse. I think maybe the problem is because it build hnsw index using a single core, even though there are 40 cores available.