facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.13k stars 3.62k forks source link

'ProductQuantizer is implemented for L2 only' #3714

Closed alexanderguzhva closed 2 months ago

alexanderguzhva commented 2 months ago

There is a line in ProductQuantizer.h that states the following: https://github.com/facebookresearch/faiss/blob/b670cb1cc6f07fc2547a92dcbc01b6d64440a53b/faiss/impl/ProductQuantizer.h#L24

@mdouze Would you please explain what this comment means exactly, because it is not clear at all :). Thanks!

jiajieyao commented 2 months ago

PQDistanceComputer # float symmetric_dis(idx_t i, idx_t j) only support L2 (computer_sdc_table only init L2 sdc table),not support IP

mdouze commented 2 months ago

The comment is outdated. PQ does support IP search. The limit is that the k-means training minimizes the L2 distance to centroids, therefore the quantization error is biased towards L2 distance. It would be possible to implement SDC for IP distances, but we have not seen a use case for it yet.

jiajieyao commented 2 months ago

review pq code,both training and encode are L2,which explain the ip low recall for sift1M . why is IP distance not supported,is it a mathematical mechanism problem or have not seen a use case for it yet? thanks @mdouze

mnorris11 commented 2 months ago

Amir has updated the code comment, so will set this to autoclose unless there are concerns.