facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.59k stars 3.57k forks source link

gpu sharding add_ids twice not working #2371

Open AviadHAv opened 2 years ago

AviadHAv commented 2 years ago

Summary

Platform

OS: linux ubuntu 18.04

Faiss version: https://github.com/facebookresearch/faiss/commit/c08cbff1a4d6c9afb6b8f69004c5530aaf80237a v1.7.2

Installed from: compiled from source

Faiss compilation options: -DBUILD_TESTING=OFF -DFAISS_ENABLE_GPU=ON -DCMAKE_CXX_COMPILER=g++-6 -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF -DFAISS_OPT_LEVEL=avx2

Running on:

Interface:

Reproduction instructions

import faiss
import numpy as np

ngpus = faiss.get_num_gpus()
print("number of GPUs:", ngpus)

gpu_resources = []

for i in range(ngpus):
    res = faiss.StandardGpuResources()
    gpu_resources.append(res)

def make_vres_vdev(i0=0, i1=-1):
    " return vectors of device ids and resources useful for gpu_multiple"
    vres = faiss.GpuResourcesVector()
    vdev = faiss.Int32Vector()
    if i1 == -1:
        i1 = ngpus
    for i in range(i0, i1):
        vdev.push_back(i)
        vres.push_back(gpu_resources[i])
    return vres, vdev

N = 50000
d = 256

xb = np.random.random((N, d)).astype('float32')
d = xb.shape[1]
ids = range(0, N)
n_ids = np.asarray(ids)
metric = faiss.METRIC_INNER_PRODUCT
index = faiss.index_factory(d, 'IDMap,Flat', metric)
co = faiss.GpuMultipleClonerOptions()
co.shard = True
vres, vdev = make_vres_vdev()
gpu_index = faiss.index_cpu_to_gpu_multiple(vres, vdev, index, co)
gpu_index.add_with_ids(xb, n_ids)
xbb = np.random.random((1000, d)).astype('float32')
idss = range(N, N+1000)
n_idss = np.asarray(idss)
gpu_index.add_with_ids(xbb, n_idss)

output

number of GPUs: 2
Traceback (most recent call last):
  File "test2.py", line 43, in <module>
    gpu_index.add_with_ids(xbb, n_idss)
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/faiss/__init__.py", line 235, in replacement_add_with_ids
    self.add_with_ids_c(n, swig_ptr(x), swig_ptr(ids))
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 8397, in add_with_ids
    return _swigfaiss_avx2.IndexIDMap_add_with_ids(self, n, x, xids)
RuntimeError: Error in void faiss::IndexShardsTemplate<IndexT>::add_with_ids(faiss::IndexShardsTemplate<IndexT>::idx_t, const component_t*, const idx_t*) [with IndexT = faiss::Index; faiss::IndexShardsTemplate<IndexT>::idx_t = long int; faiss::IndexShardsTemplate<IndexT>::component_t = float] at /home/conda/feedstock_root/build_artifacts/faiss-split_1644327811086/work/faiss/IndexShards.cpp:237: Error: 'this->ntotal == 0' failed: when adding to IndexShards with sucessive_ids, only add() in a single pass is supported

thank you guys !!! :smile:

mdouze commented 2 years ago

hmm this is weird, normally the index_shards should not set successive_ids, see https://github.com/facebookresearch/faiss/blob/main/faiss/gpu/GpuCloner.cpp#L337 could you post a small repro script?

AviadHAv commented 2 years ago

@mdouze dude I've added code example btw when I change successive_ids hardcoded to false I get on first upload

Error in virtual void faiss::gpu::GpuIndexFlat::addImpl_(int, const float*, const idx_t*) at /build/aviad/faiss/faiss/gpu/GpuIndexFlat.cu:204: Error: '!ids' failed: add_with_ids not supported
AviadHAv commented 2 years ago

btw when dropping on function add_with_ids in IndexShards.cpp the check

            !(successive_ids && xids),
            "It makes no sense to pass in ids and "
            "request them to be shifted");

everything works well except of one thing I manage to add a lot of vectors one by one but can't do a bulk add with a couple of vectors which means add 1,000,000 million of vectors will take more than one hour https://github.com/facebookresearch/faiss/blob/main/faiss/IndexShards.cpp#L228

AviadHAv commented 2 years ago

replace to IDMap,IVF1,Flat solve all problems !!! thanks

mdouze commented 2 years ago

If you use IVF1,Flat then IDMap becomes unnecessary.

mdouze commented 2 years ago

Hmm right, I understand the problem: when you move IDMap,Flat to GPU it builds an IDMap + an IndexShards of Flats, instead of an IndexShards of IDMap,Flat. I'd have to think a bit how to fix that in a robust way. Marking as enhancement.