Open phosphorylation opened 2 years ago
Yes this sounds good, and the remark about for what parameters it would be interesting is relevant. Remark: grouping by inverted list is implemented for the PQ "fast-scan" but not for other IVF classes (including IVFFlat). If you make a PR, it would probably be best to override the search function and use a special codepath when pmode=4.
Summary
This is a petition for a pull request. There are some very good blas distance functions in FLAT, but they aren't used at all in IVFFLAT. This is because all current parallel mode of IVFFLAT process queries one by one. To boost performance and utilize these blas implementations I propose a new pmode=4 route for IVFFLAT index: we parallelize over each list, group together all the queries that need to scan the current list, and batch scan the current list using all these affiliated queries. This way, we might get nq for each list to be larger than distance_compute_blas_threshold so we might use knn_{L2sqr/inner_product}_blas. I already implemented this and did a crude benchmark on SIFT1M:
running on Intel(R) Xeon(R) CPU E5-2683 v4. SIFT1M nlist=1024 k=50 single thread case
Overall, this pmode performes better as your (nprobe*nq/nlist) increase. as I tested it on SIFT10M, it could provide 10X performance compares to pmode=0 under certain conditions. If you guys are interested I'll make a pull request.
Platform
OS: Ubuntu-16.04.1
Faiss version: 1.7.1
Installed from: compiled myself
Faiss compilation options: -DBLA_VENDOR=Intel10_64_dyn -DCMAKE_BUILD_TYPE=Release -DFAISS_OPT_LEVEL=avx2
Running on:
Interface:
Reproduction instructions