facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
29.4k stars 3.48k forks source link

Cannot search sharded index on GPU using GPU tensor | AssertionError: GPU tensor on CPU index not allowed #2074

Open SirRob1997 opened 2 years ago

SirRob1997 commented 2 years ago

Summary

I'm using a sharded index (IndexShards) on multiple GPUs and want to search it using tensors that are already on the GPU.

I've done the import faiss.contrib.torch_utils that should change the functions to be able to take Torch GPU tensors, this works in the single GPU setup but not for the sharded one. See below reproduction instructions.

Stack trace:

Traceback (most recent call last):
  File "test.py", line 76, in <module>
    dists, knns = index.search(search_this, 64)
  File "/miniconda/envs/pytorch_env/lib/python3.7/site-packages/faiss/contrib/torch_utils.py", line 229, in torch_replacement_search
    assert hasattr(self, 'getDevice'), 'GPU tensor on CPU index not allowed'
AssertionError: GPU tensor on CPU index not allowed

When commenting in the parts for the single-gpu setup instead, everything works fine!

Platform

OS: Ubuntu 18.04.4 LTS

Faiss version: 1.7.0 // 1.7.1 // 1.7.1.post2

Installed from: pip

Running on:

Interface:

Reproduction instructions

import faiss
import torch
import time
import os
import numpy as np
import faiss.contrib.torch_utils

np.random.seed(42)

index_path = 'test_index.trained'
dstore_size = 1000000
dimension = 1024
device = 'cuda'

keys = torch.rand(dstore_size, dimension, dtype=torch.float32)
vals = torch.randint(300, (dstore_size, 1))

co = faiss.GpuMultipleClonerOptions()
co.shard = True
co.useFloat16 = True

#co = faiss.GpuClonerOptions()
#co.useFloat16 = True
#res = faiss.StandardGpuResources()

if not os.path.exists(index_path):
    quantizer = faiss.IndexFlatL2(dimension)
    index = faiss.IndexIVFPQ(quantizer, dimension, 4096, 64, 8)
    index.nprobe = 32

    gpu_index = faiss.index_cpu_to_all_gpus(index, co)
    print("Training Index")
    random_sample = np.random.choice(
            np.arange(vals.shape[0]),
            size=[min(3000000, vals.shape[0])],
            replace=False,
            )
    start = time.time()
    gpu_index.train(keys[random_sample])
    print("Training took {} s".format(time.time() - start))
    faiss.write_index(faiss.index_gpu_to_cpu(gpu_index), index_path)

    print("Adding Keys")
    index = faiss.read_index(index_path)
    #gpu_index = faiss.index_cpu_to_gpu(res, 0, index, co)
    gpu_index = faiss.index_cpu_to_all_gpus(index, co)
    start = 0
    num_keys_to_add_at_a_time = 500000
    start_time = time.time()
    while start < dstore_size:
        end = min(dstore_size, start + num_keys_to_add_at_a_time)
        to_add = keys[start:end]
        gpu_index.add_with_ids(to_add, torch.arange(start, end))
        start += num_keys_to_add_at_a_time
        print("Added %d tokens so far" % start)

        if (start % 1000000) == 0:
            print(f"Writing Index {start}")
            faiss.write_index(faiss.index_gpu_to_cpu(gpu_index), index_path)

    print("Adding total %d keys" % end)
    print("Adding took {} s".format(time.time() - start_time))
    print("Writing Index")
    start = time.time()
    faiss.write_index(faiss.index_gpu_to_cpu(gpu_index), index_path)
    print("Writing index took {} s".format(time.time() - start))

index = faiss.read_index(index_path, faiss.IO_FLAG_ONDISK_SAME_DIR)
print(f'LOADED INDEX {type(index)}')
index = faiss.index_cpu_to_all_gpus(index, co)
#index = faiss.index_cpu_to_gpu(res, 0, index, co)

vals = vals.cuda()

search_this = torch.rand(100, 1024, device=device)
dists, knns = index.search(search_this, 64)
print(dists, knns)
mdouze commented 2 years ago

There is a good reason for that, which is that the IndexShards object that distributes search/add to the GPU sub-indexes is a CPU index. However, since the search and add functions rely only on pointer arithmetic and don't access the data itself, it may be possible to relax this constraint. Let's talk about it with @wickedfoo

SirRob1997 commented 2 years ago

I see, it would make a lot of sense to relax this from a usage perspective. I guess, for now, the correct usage is to transfer the tensors to CPU and the IndexShards object will move them to GPU again?