Open HiCheems opened 8 months ago
Building an HNSW index is indeed slow, but 48h seems excessive. Could you try installing Faiss with conda?
Building an HNSW index is indeed slow, but 48h seems excessive. Could you try installing Faiss with conda?
It gets worse. I think maybe the problem is because it build hnsw index using a single core, even though there are 40 cores available.
Summary
Recently, I build an index with index type "IDMap,HNSW32,Flat". The dataset size is 8M and dimension is 200. I used a very lone time to build it, more than 48h. Is some parameter setting wrong?
Platform
OS: docker with 40 core
Installed from: pip install faiss-cpu
Running on:
Interface:
Reproduction instructions
import os os.environ["OMP_NUM_THREADS"] = "20" os.environ["OMP_WAIT_POLICY"] = "PASSIVE"
dimension = 200
index_type = "IDMap,HNSW16,Flat" metric_type = faiss.METRIC_INNER_PRODUCT index = faiss.index_factory(dimension,index_type,metric_type)
for i in range(8000000): embedding = np.random.rand(dimension).astype('float32') l2_norm = np.linalg.norm(embedding) normalized_embedding = embedding / l2_norm normalized_embedding = normalized_embedding.reshape(1, -1) index.add_with_ids(normalized_embedding, np.array([i]))