facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.48k stars 3.56k forks source link

`index_factory` is much slower than the same index built without the factory #3814

Closed Dr-Left closed 1 week ago

Dr-Left commented 2 weeks ago

Summary

When I build indexes by faiss.index_factory(n, "IVF400,PQ8") and do the training by faiss.train(xb), the training process of those are much slower than that of indexes built by ` the indexes built by

quantizer = faiss.IndexFlat(n, faiss.METRIC_INNDER_PRODUCT)
index = faiss.IndexIVFPQ(quantizer, n, 400, 8, 8)

n is 128 in my case.

Results:

max threads = 10                                                                                          
Training time: 0.17231035232543945                                                                        
Index: IVFFlat                                                                                            
Search Time: 0.11691093444824219 ms                                                                       
Recall: 0.0                                                                                               

Training time: 0.2723979949951172                                                                         
Index: IVFPQ                                                                                              
Search Time: 0.11080026626586914 ms                                                                       
Recall: 0.0                                                                                               

Training time: 3.416989326477051                                                                          
Index: IndexFactory,IVFPQ                                                                                 
Search Time: 0.11123418807983398 ms                                                                       
Recall: 0.0                                                                                               

Training time: 0.16253113746643066                                                                        
Index: IndexFactory,IVFFlat                                                                               
Search Time: 0.11323690414428711 ms                                                                       
Recall: 0.0          

Platform

OS: Linux CPU: AMD EPYC 7742 64-Core Processor x86_64

Faiss version: 1.8.0.post1

Installed from: pip install

Faiss compilation options: Default

Running on:

Interface:

Reproduction instructions

Python 3.9.19 https://gist.github.com/Dr-Left/cb38037ef4784764b600ff669672f4ca

Dr-Left commented 2 weeks ago

Any insights? Why this would happen? Did I make it wrong on some parameters?

mdouze commented 1 week ago

The index_factory enables polysemous code training by default. You can disable it with "IVF400,PQ8np"

Dr-Left commented 1 week ago

Thank you. I get it.

Btw, what are the advantages of using polysemous code? Will the searching process be accelerated?