facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.57k stars 3.57k forks source link

Training multiple indexes on a GPU, in-parallel #2734

Closed Prabhat1808 closed 2 months ago

Prabhat1808 commented 1 year ago

Summary

Using python multiprocessing library to train multiple indexes on a GPU, in-parallel, throws the following error ->

RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /root/miniconda3/conda-bld/faiss-pkg_1623030479928/work/faiss/gpu/StandardGpuResources.cpp:283: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 3 initialization error) """

Platform

OS: Ubuntu 18.04.6 LTS

Faiss version: 1.7.1

Installed from: anaconda

Faiss compilation options:

Running on:

Interface:

Reproduction instructions

Index Creation and Training Function ->


def build_index(xb):
    nprobe, nlist, M, efsearch, efc = 1, 1024, 8, 2, 40
    dims = 72 

    t = time()  
    index_factory_string = 'BIVF{}_HNSW{}'.format(nlist, M)
    index = faiss.index_binary_factory(dims * 8, index_factory_string)

    res = faiss.StandardGpuResources()

    print ('Index Created')
    index.nprobe = nprobe
    quantizer = faiss.downcast_IndexBinary(index.quantizer)
    quantizer.hnsw.efSearch = efsearch
    quantizer.hnsw.efConstruction = efc

    print ('Converting Index CPU to GPU')
    clustering_index_cpu = faiss.IndexFlatL2(dims * 8)
    print ('\t Step 1 Done')
    clustering_index = faiss.index_cpu_to_gpu(res, 1, clustering_index_cpu)
    print ('\t Step 2 Done')
    index.clustering_index = clustering_index
    print ('\t Step 3 Done')
    init_t = time() - t

    print ('Training Index')
    t = time()
    index.train(xb)
    train_t = time() - t

    print ('Adding vectors to index')
    t = time()
    index.add(xb)
    add_t = time() - t
    print(f'Time taken to add {index.ntotal} hashes to IVFHNSW_nlist={nlist}_nprobe={nprobe}_m={M}_efs={efsearch}_efc={efc} index: ({init_t:.3f}, {train_t:.3f}, {add_t:.3f}) s')

    return np.round([init_t, train_t, add_t], 3)

Creating random data, for ease of issue reproduction -> data = np.random.randint(1, 200, (99529, 72), dtype='uint8')


Running on single process ->

build_index(data)

Output on single process ->

Index Created
Converting Index CPU to GPU
     Step 1 Done
     Step 2 Done
     Step 3 Done
Training Index
Adding vectors to index
Time taken to add 99529 hashes to IVFHNSW_nlist=1024_nprobe=1_m=8_efs=2_efc=40 index: (0.058, 1.312, 0.017) s
array([0.058, 1.312, 0.017])

Running on multiple process (using python multiprocessing pool) ->

num_proc = 4
proc_pool = Pool(num_proc)
res = proc_pool.map(build_index, [data]*num_proc)

Throws the following error ->

Index Created
Converting Index CPU to GPU
     Step 1 Done
Index Created
Converting Index CPU to GPU
     Step 1 Done
Index Created
Converting Index CPU to GPU
     Step 1 Done
Index Created
Converting Index CPU to GPU
     Step 1 Done
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/envs/fp/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/envs/fp/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/tmp/ipykernel_8991/2789175581.py", line 22, in build_index
    clustering_index = faiss.index_cpu_to_gpu(res, 1, clustering_index_cpu)
  File "/opt/conda/envs/fp/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 6513, in index_cpu_to_gpu
    return _swigfaiss_avx2.index_cpu_to_gpu(provider, device, index, options)
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /root/miniconda3/conda-bld/faiss-pkg_1623030479928/work/faiss/gpu/StandardGpuResources.cpp:283: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 3 initialization error)
"""
mdouze commented 1 year ago

It is not safe to use one GPU on multiple processes.

Prabhat1808 commented 1 year ago

@mdouze I see. Is the above issue because FAISS does not support multiple process using the same GPU (as its unsafe)? Or is it something else?

P.S. Would like to understand why it is unsafe, so as to see if it works for my use-case, even if in general it is not recommended. Can you point me to the relevant resources?

Thanks

matrixji commented 1 year ago

According to cudaHostAlloc return 3, which possible mean GPU is not available.

@Prabhat1808 maybe you could check the below code:

clustering_index = faiss.index_cpu_to_gpu(res, 1, clustering_index_cpu)

Here device=1, which means the 2nd card on your system, do you really have at least 2 GPUs here? the device parameter starts from 0.

Prabhat1808 commented 1 year ago

@matrixji the machine has 3 GPUs, so the above is not the cause of error. Moreover, the 2nd card is available and the index.train() step works normally.

The issue occurs when I try to create multiple processes and use them to train multiple indexes in-parallel, as mentioned above.

matrixji commented 1 year ago

num_proc = 4 proc_pool = Pool(num_proc) res = proc_pool.map(build_index, [data]*num_proc)

Got it, Actually, I've tried running your code on faiss(compile from master), and it succeeds. So, which GPU card you're using, and how about your host memory? I've noticed it may require about 10GB GPU memory and about 8GB host memory for your code(As failed while hostAlloc, probably host memory does not meet the requires).