facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.47k stars 3.56k forks source link

new parallel mode for IVFFLAT #2306

Open phosphorylation opened 2 years ago

phosphorylation commented 2 years ago

Summary

This is a petition for a pull request. There are some very good blas distance functions in FLAT, but they aren't used at all in IVFFLAT. This is because all current parallel mode of IVFFLAT process queries one by one. To boost performance and utilize these blas implementations I propose a new pmode=4 route for IVFFLAT index: we parallelize over each list, group together all the queries that need to scan the current list, and batch scan the current list using all these affiliated queries. This way, we might get nq for each list to be larger than distance_compute_blas_threshold so we might use knn_{L2sqr/inner_product}_blas. I already implemented this and did a crude benchmark on SIFT1M:

running on Intel(R) Xeon(R) CPU E5-2683 v4. SIFT1M nlist=1024 k=50 single thread case

pmode=0 pmode=4
nprobe nq
16 500 1.290s 0.776s
16 1000 2.579s 0.729s
16 10000 25.667s 3.896s
32 500 2.432s 0.767s
32 1000 5.062s 0.864s
32 10000 49.453s 7.454s
omp_num_threads=8 case pmode=0 pmode=4
nprobe nq
16 500 0.157s 0.096s
16 1000 0.303s 0.101s
16 10000 2.805s 0.491s
32 500 0.289s 0.095s
32 1000 0.576s 0.122s
32 10000 5.495s 0.853s

Overall, this pmode performes better as your (nprobe*nq/nlist) increase. as I tested it on SIFT10M, it could provide 10X performance compares to pmode=0 under certain conditions. If you guys are interested I'll make a pull request.

Platform

OS: Ubuntu-16.04.1

Faiss version: 1.7.1

Installed from: compiled myself

Faiss compilation options: -DBLA_VENDOR=Intel10_64_dyn -DCMAKE_BUILD_TYPE=Release -DFAISS_OPT_LEVEL=avx2

Running on:

Interface:

Reproduction instructions

mdouze commented 2 years ago

Yes this sounds good, and the remark about for what parameters it would be interesting is relevant. Remark: grouping by inverted list is implemented for the PQ "fast-scan" but not for other IVF classes (including IVFFlat). If you make a PR, it would probably be best to override the search function and use a special codepath when pmode=4.