facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.42k stars 3.64k forks source link

Get a CUDA error when searching in a Faiss index having too small vectors #3062

Closed hayj closed 4 months ago

hayj commented 1 year ago

Summary

Get a CUDA error when searching in a Faiss index having too small vectors

Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::ivfInterleavedScanImpl_32_(faiss::gpu::Tensor<float, 2, true>&, faiss::gpu::Tensor<long int, 2, true>&, faiss::gpu::DeviceVector<void*>&, faiss::gpu::DeviceVector<void*>&, faiss::gpu::IndicesOptions, faiss::gpu::DeviceVector<int>&, int, faiss::MetricType, bool, faiss::gpu::Tensor<float, 3, true>&, faiss::gpu::GpuScalarQuantizer*, faiss::gpu::Tensor<float, 2, true>&, faiss::gpu::Tensor<long int, 2, true>&, faiss::gpu::GpuResources*) at /project/faiss/faiss/gpu/impl/scan/IVFInterleaved32.cu:13; details: CUDA error 9 invalid configuration argument
        Aborted (core dumped)

or

Faiss assertion 'err__ == cudaSuccess' failed in int faiss::gpu::getNumDevices() at /project/faiss/faiss/gpu/utils/DeviceUtils.cu:36; details: CUDA error 401 the operation cannot be performed in the present state

Platform

Installed from: https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=1a9755132bb81dc3daecd16e0b5471ddf0246e555a889ea311813f867fdcca88

Running on:

Interface:

Reproduction instructions

import faiss
import numpy as np
nb_vectors = 80_000
dim = 128
k = 5
nlist = 10
nprobe = 2
vectors = np.random.rand(nb_vectors, dim)
index = faiss.index_factory(dim, "IVF16384,Flat")
index.nlist = nlist
index.nprobe = nprobe
options = faiss.GpuMultipleClonerOptions()
options.shard = True
options.common_ivf_quantizer = True
index = faiss.index_cpu_to_all_gpus(index, options)
index.train(vectors)
index.add(vectors)
results = index.search(vectors, k)

This code works with dim = 1024.

I also tried with different indexes and different parameters (nlist, etc.) but it always fails for a certain vector size (and not when increasing the size).

When I try to install different versions of Faiss (nightly and old version) I face incompatibility issues such as:

AttributeError: module 'faiss._swigfaiss' has no attribute 'delete_ParameterRangeVector'

or

TypeError: in method 'GpuIndexIVFFlat_train', argument 3 of type 'float const *'
hayj commented 1 year ago

Strangely, I made a local environment in JupyterLab which is able to launch the script, but I don't manage to reproduce installation command lines to make it works in our docker. The pip freeze gives this:

faiss==1.7.4
faiss-cpu==1.7.4
faiss-gpu==1.7.2
hayj commented 1 year ago

When I execute this demo script, it works: https://github.com/facebookresearch/faiss/blob/main/tutorial/python/4-GPU.py But when I set nq = 100000, it fails.

So it seems it fails with certain combinations of dim + dataset size + batch size under Faiss version 1.7.3 (not 1.7.4). Hence the fix is to reduce the batch size (i.e. to split vectors when searching in my main message above).

Note that in some cases it also fails under processes using multiprocessing, so in my code, I replaced multiprocessing by threading.

hayj commented 1 year ago

I re-opened this issue in case it's necessary to solve this issue even on newer versions.

mdouze commented 1 year ago

Please install with anaconda.