facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.38k stars 3.64k forks source link

`faiss.IndexFlatIP` is 18x slower than `torch.mm` on CPU (conda-forge installation) #2499

Open xhluca opened 2 years ago

xhluca commented 2 years ago

Summary

It seems that on CPU, faiss.IndexFlatIP is ~18x slower than using PyTorch operations (torch.mm and torch.topk) when running on an index of 2M documents of dimension 768. The output results is exactly the same.

Kaggle Notebook for reproducing issue.

Platform

OS: Ubuntu

Faiss version: faiss-cpu-1.7.2

Installed from: conda-forge

Faiss compilation options: N/A

Running on:

Interface:

Reproduction instructions

After installing torch 1.11, I install FAISS with:

conda install -c conda-forge faiss-cpu -y

Run the following code inside a jupyter notebook:


import faiss
import torch

D = 2000000 # Index size
E = 768 # Embedding size
B = 32 # Batch size

torch.manual_seed(42)

passages = torch.randn(D, E)
query = torch.randn(B, E)

index = faiss.IndexFlatIP(E)
index.add(passages.numpy())

%time results_pt = torch.topk(torch.mm(query, passages.T), k=100)
# CPU times: user 3.38 s, sys: 346 ms, total: 3.73 s
# Wall time: 1.91 s

%time faiss_values, faiss_indices = index.search(query.numpy(), k=100)
# CPU times: user 1min 45s, sys: 35.1 s, total: 2min 20s
# Wall time: 35.9 s

print("Values all close:", torch.allclose(results_pt.values, torch.tensor(faiss_values)))
print("Indices all same:", torch.all(results_pt.indices == torch.tensor(faiss_indices)))
# Values all close: True
# Indices all same: tensor(True)

This was ran on a Kaggle Notebook with 4 cpus and 16GB RAM. I did not use colab since I could not install faiss without conda.

Notes

mdouze commented 2 years ago

I think this is an installation issue, the runtime is slow for both of your resutls. I get almost exactly the same perf issue_2499.ipynb

xhluca commented 2 years ago

I've tried installing it with conda install -c pytorch faiss-cpu -y (see notebook v4) instead and now index.search gives me an error:

INTEL MKL ERROR: /opt/conda/lib/python3.7/site-packages/faiss/../../.././libmkl_avx2.so.2: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so.2 or libmkl_def.so.2.

Also I noticed the issue not just on Kaggle, but also in other situations (Dockerized environment on a server) also installed with conda-forge.

If it's an installation issue in both cases, is there any test I can run to check? E.g. a CLI command like:

faiss check-installation
xhluca commented 2 years ago

So I tried to use the unofficial pip installation method pip install faiss-cpu, and it significantly reduced the time it takes (but still 2x slower):

%time results_pt = torch.topk(torch.mm(query, passages.T), k=100)
%time results_pt = torch.topk(torch.mm(query, passages.T), k=100)

CPU times: user 2.95 s, sys: 326 ms, total: 3.27 s
Wall time: 1.66 s
CPU times: user 2.91 s, sys: 167 ms, total: 3.08 s
Wall time: 1.54 s

%time faiss_values, faiss_indices = index.search(query.numpy(), k=100)
%time faiss_values, faiss_indices = index.search(query.numpy(), k=100)

CPU times: user 15.1 s, sys: 3.9 ms, total: 15.1 s
Wall time: 3.8 s
CPU times: user 15.2 s, sys: 7.85 ms, total: 15.2 s
Wall time: 3.86 s
mdouze commented 2 years ago

Here's the install that I use if it helps:

conda create -n faiss_1.7.2 python=3.8
conda activate faiss_1.7.2
conda install -c pytorch faiss-gpu cudatoolkit=10.2
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
xhluca commented 2 years ago

My experiments were with faiss-cpu, since I'm planning to use data too large to fit on a GPU. I haven't tried GPU yet, but will try when I have the chance.