Closed billkle1n closed 7 years ago
I tried disabled the GPU code (by renaming swigfaiss_gpu.py
in site-packages) on the EC2 machine but that didn't have a noticeable difference on speed (which I expected since I'm not explicitly using a GPU index).
$ python tests/slow_test.py
Failed to load GPU Faiss: No module named 'swigfaiss_gpu'
Faiss falling back to CPU-only.
WARNING clustering 266 points to 8 centroids: please provide at least 312 training points
WARNING clustering 266 points to 256 centroids: please provide at least 9984 training points
[... removed a bunch of warnings ...]
Index: N5faiss10IndexIVFPQE -> 0 elements
list size in < 1: 8 instances
add_core times: 4.160 4.532 0.003
index.ntotal = 3
done in 26.228580951690674s
I recompiled faiss with the Intel MKL library on the AWS EC2 machine and it's a lot faster:
$ python tests/slow_test.py
WARNING clustering 266 points to 8 centroids: please provide at least 312 training points
[... removed a bunch of warnings ...]
Index: N5faiss10IndexIVFPQE -> 0 elements
list size in < 1: 8 instances
add_core times: 4.157 1.822 0.004
index.ntotal = 3
done in 0.2357316017150879s
Is OpenBLAS just really that much slower than "Intel MKL" and "Apple's framework accelerate"?
Hi
The BLAS implementation matters a lot. See the comments in the install file about our findings on the relative speed of MKL, OpenBLAS and Accelerate. Also note that OpenBLAS has an interaction problem with OpenMP:
https://github.com/facebookresearch/faiss/wiki/Troubleshooting#slow-brute-force-search-with-openblas
Thanks, looks like export OMP_WAIT_POLICY=PASSIVE
significantly sped up the test as well with OpenBLAS.
@billkle1n could you provide your setup script for MKL & python? I can compile faiss with mkl but get segfault in runtime.
@gf0507033 I believe the segfault is unrelated to MKL (I used the MKL installation script from Intel and uncommented relevant lines in the Faiss makefile). It's a known issue with the way the Faiss Python bindings are implemented and I believe the cause is that the Python runtime sometimes deletes (garbage collects) objects that other Faiss structures are still referencing from the C++ code. In general the solution is to use the index_factory function.
I had came across the same problem on Linux. On linux it indexing 1 million vectors cost 120 minutes, while on Mac Pro Intel 2020 it is 2.5 minutes. it is about 50x slower on linux than that on Mac Pro Intel.
And after rebuild faiss with intel mkl support, it cost only 2 minutes to index 1 million vectors.
My distribution is CentOS 7, and mkl is installed by yum.
yum-config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo
yum install -y intel-mkl
I ran the following code on my laptop (a MacBook Pro Retina, 15-inch, Mid 2015, 2.2 GHz Intel Core i7) and on a p2.xlarge AWS EC2 instance (Ubuntu 16.04 AMI). It is almost 40x slower on the AWS machine and I can't figure out why.
One noticeable difference is my Macbook does not have an NVIDIA GPU so I didn't compile GPU faiss whereas it is compiled on the AWS machine. That said, I'm not explicitly using the GPU in the test so it shouldn't really matter, right? And even if it was silently switching to a GPU based K-means on the EC2 instance, wouldn't that be faster not slower?
The only thing that I can think could make a difference is the libraries/flags that were used when compiling (here's part of the script I used on AWS - I can clean it up and share the full version if it helps)?
Here's the python code:
Logs on my Macbook Pro:
Logs on AWS:
Edit: here's a log of how I compiled faiss, just re-compiled again on EC2 machine: https://gist.github.com/anonymous/6cf10f15d1d6b9b45f63dad6a0b89873