facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.35k stars 3.54k forks source link

OnDisk IVF and GPU Search memory issue #3493

Open rationalga opened 3 months ago

rationalga commented 3 months ago

Summary

Hi, I constructed a Faiss index that stores the inverted file data on disk as demonstrated in demo_ondisk_ivf.py. To speed up the search, I wanted to use GPUs. For demonstration purposes, below is how I added the GPU search as an additional stage to the above code.


if stage == 7:
    index = faiss.read_index(tmpdir + "populated.index")

    gpu_index = faiss.index_cpu_to_all_gpus(index)  # This becomes faiss.swigfaiss.IndexReplicas
    print("index.count():", gpu_index.count())

    if isinstance(gpu_index, faiss.IndexReplicas):
        for i in range(gpu_index.count()):
            sub_index = faiss.downcast_index(gpu_index.at(i))
            sub_index.nprobe = 16
            print(f"Sub-index {i} nprobe set to {sub_index.nprobe}.")
    else:
        gpu_index.nprobe = 16

    # Load query vectors and ground-truth
    xq = fvecs_read("sift1M/sift_query.fvecs")
    gt = ivecs_read("sift1M/sift_groundtruth.ivecs")

    try:
        D, I = gpu_index.search(xq, 5)
        recall_at_1 = (I[:, :1] == gt[:, :1]).sum() / float(xq.shape[0])
        print("recall@1: %.3f" % recall_at_1)
    except Exception as e:
        print("Error during GPU search:", e)

I am encountering a memory usage issue with GPU-based search . The search operation works correctly, and I achieve the same recall as with the CPU implementation. However, I have observed that when I transfer the index to the GPU using faiss.index_cpu_to_all_gpus(index), the entire index (including the quantizer and all inverted lists) is loaded into GPU memory. In contrast, the CPU implementation only loads the necessary inverted lists depending on the nprobe value during the search (which is the desired behaviour here)

Could someone confirm if this is a current limitation of the OnDiskInvertedLists implementation for IVF indices in Faiss, or if there is an error in my approach? Any insights or solutions would be greatly appreciated. Thank you!

Question/ Problem

The GPU search seems to works fine and I get the same recall as well. The problem is that when we do index = faiss.index_cpu_to_all_gpus(index), the whole of index is moved to GPU i.e. quantizer + all of inverted lists wheres on CPU only implemenation dependng on nprobe only the selected inverted lists are loaded during search . Can anyone confirm if this is the current limitation of OnDiskInvertedDisk IVF or I am doing something wrong.

Thank you.

mdouze commented 3 months ago

index_cpu_to_all_gpus is moving the data to GPU memory. To reduce GPU memory utilization, you can shard the dataset over several GPUs setting shard = True in the GpuMultipleClonerOptions object.

rationalga commented 3 months ago

Thank you for your reply.

Here is what I have understood so far:

When using the CPU-only implementation, the smaller index, i.e., populated.index, is loaded and then only the necessary nprobe number of inverted lists are loaded during the search:

index = faiss.read_index(tmpdir + "populated.index")
D, I = index.search(xq, 5) 

However, in the case of the GPU implementation, the smaller index, i.e., populated.index, is loaded, but when we move this index to the GPU to perform the search, the entire index, including all inverted lists, is loaded into GPU memory.

index = faiss.read_index(tmpdir + "populated.index")
gpu_index = faiss.index_cpu_to_all_gpus(index)

Is there a way to move the index to the GPU for the search while only loading the nprobe number of inverted lists into GPU memory? I think sharding won't solve the out of memory issue if the inverted lists are too large for RAM. Thank you for your time and apologies if they are obviously silly questions.