Closed Anjum48 closed 5 years ago
How much data are you adding in reference_vecs (# of vecs and dimension), and also what's the size of the query_vecs?
Thanks for the reply! reference_vecs = (1067000, 2048)
and query_vecs = (113000, 2048)
.
I have a suspicion that memory is not being released after an index is no longer needed however my understanding is that reset()
should take care of this. Watching the memory usage in nvidia-smi
I can definitely see the memory drop every time reset()
is called but maybe not down to a level that is required for the next operation?
My pipeline calls this function first, then nearest_neighbours
.
def average_query_expansion(query_vecs, reference_vecs, resources, top_k=5):
"""
Average Query Expansion (AQE)
Ondrej Chum, et al. "Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval,"
International Conference of Computer Vision. 2007.
https://www.robots.ox.ac.uk/~vgg/publications/papers/chum07b.pdf
https://github.com/leeesangwon/PyTorch-Image-Retrieval/blob/public/inference.py
"""
query_vecs = query_vecs.astype(np.float32)
reference_vecs = reference_vecs.astype(np.float32)
# Query augmentation
query_aug = faiss.IndexFlatIP(reference_vecs.shape[1])
query_aug = faiss.index_cpu_to_all_gpus(query_aug)
query_aug.add(reference_vecs)
distances, indexes = query_aug.search(x=query_vecs, k=top_k)
query_aug.reset()
top_k_ref_mean = np.mean(reference_vecs[indexes], axis=1, dtype=np.float32)
query_vecs = np.concatenate([query_vecs, top_k_ref_mean], axis=1)
# Reference augmentation
ref_aug = faiss.IndexFlatIP(reference_vecs.shape[1])
ref_aug = faiss.index_cpu_to_all_gpus(ref_aug)
ref_aug.add(reference_vecs)
distances, indexes = ref_aug.search(x=reference_vecs, k=top_k + 1)
ref_aug.reset()
top_k_ref_mean = np.mean(reference_vecs[indexes], axis=1, dtype=np.float32)
reference_vecs = np.concatenate([reference_vecs, top_k_ref_mean], axis=1)
return query_vecs, reference_vecs
average_query_expansion
expands the embeddings to (N, 4096)
before passing to nearest_neighbours
.
However, sometimes I get the error in step 1, usually in step 2. With CPU only it works fine (albeit slowly).
Things that I have tried/didn't work:
GpuIndexFlatIP
with a single GPUStandardGpuResources
with a single GPUAfter playing with my toy dataset I think the issue might lie somewhere else in my code. I'll close this for now and reopen if I'm still facing issues
Ok I'm still getting MemoryErrors with the following toy example which is based on an example from the docs:
import numpy as np
import faiss
import torch
def swig_ptr_from_FloatTensor(x):
assert x.is_contiguous()
assert x.dtype == torch.float32
return faiss.cast_integer_to_float_ptr(
x.storage().data_ptr() + x.storage_offset() * 4)
def swig_ptr_from_LongTensor(x):
assert x.is_contiguous()
assert x.dtype == torch.int64, 'dtype=%s' % x.dtype
return faiss.cast_integer_to_long_ptr(
x.storage().data_ptr() + x.storage_offset() * 8)
def search_knn(res, xb, xq, k, D=None, I=None, metric=faiss.METRIC_INNER_PRODUCT):
assert xb.device == xq.device
xq_ptr = swig_ptr_from_FloatTensor(xq)
nq, d = xq.size()
xb_ptr = swig_ptr_from_FloatTensor(xb)
nb, d2 = xb.size()
assert d2 == d
if D is None:
D = torch.empty(nq, k, device=xb.device, dtype=torch.float32)
else:
assert D.shape == (nq, k)
assert D.device == xb.device
if I is None:
I = torch.empty(nq, k, device=xb.device, dtype=torch.int64)
else:
assert I.shape == (nq, k)
assert I.device == xb.device
D_ptr = swig_ptr_from_FloatTensor(D)
I_ptr = swig_ptr_from_LongTensor(I)
faiss.bruteForceKnn(res, metric,
xb_ptr, nb,
xq_ptr, nq,
d, k, D_ptr, I_ptr)
return D, I
index_embeddings = np.random.normal(size=(1000000, 2048)).astype(np.float32)
test_embeddings = np.random.normal(size=(100000, 2048)).astype(np.float32)
# move to pytorch & GPU
index_embeddings = torch.from_numpy(index_embeddings).cuda()
test_embeddings = torch.from_numpy(test_embeddings).cuda()
# resource object, can be re-used over calls
res = faiss.StandardGpuResources()
# put on same stream as pytorch to avoid synchronizing streams
res.setDefaultNullStreamAllDevices()
distances, indexes = search_knn(res, index_embeddings, test_embeddings, k=100)
Output:
Traceback (most recent call last):
File "/home/anjum/PycharmProjects/retrieval/faiss error.py", line 66, in <module>
distances, indexes = search_knn(res, index_embeddings, test_embeddings, k=100)
File "/home/anjum/PycharmProjects/retrieval/faiss error.py", line 48, in search_knn
d, k, D_ptr, I_ptr)
RuntimeError: Error in void faiss::gpu::allocMemorySpaceV(faiss::gpu::MemorySpace, void**, size_t) at gpu/utils/MemorySpace.cpp:27: Error: 'err == cudaSuccess' failed: failed to cudaMalloc 1610612736 bytes (error 2 out of memory)
Any ideas on how to get things moving? I think I need to somehow batch the data to the GPU but the example given isn't entirely obvious to me
Ok so it turns out quantization is my friend and IndexIVFPQ
is my friend when it comes to memory issues.
The wiki says "The index types IndexFlat, IndexIVFFlat and IndexIVFPQ are implemented on the GPU, as GpuIndexFlat, GpuIndexIVFFlat and GpuIndexIVFPQ. In addition to their normal arguments, they take a resource object as input, along with index storage configuration options and float16/float32 configuration parameters."
It would be great to have an example somewhere that shows how to set the float16 option
hi,dear have the same problem, have you got success? could you pls help me ? thx
hi,dear have the same problem, have you got success? could you pls help me ? thx i have the same problem, have you got success??
hello ,do you have any solution? i met the same problem
For those who know Chinese, this blog clearly explains how to solve the ``out of memory" problem. For English users, please check the following code:
nlist = 100 m = 8 # number of bytes per vector k = 4 quantizer = faiss.IndexFlatL2(d) # this remains the same index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
index.train(xb) index.add(xb) D, I = index.search(xb[:5], k) # sanity check print(I) print(D) index.nprobe = 10 # make comparable with experiment above D, I = index.search(xq, k) # search print(I[-5:])
https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU
All GPU indexes are built with a StandardGpuResources object (which is an implementation of the abstract class GpuResources). The resource object contains needed resources for each GPU in use, including an allocation of temporary scratch space (by default, about 2 GB on a 12 GB GPU), cuBLAS handles and CUDA streams.
Scratch memory The temporary scratch space via the GpuResources object is important for speed and to avoid unnecessary GPU/GPU and GPU/CPU synchronizations via cudaFree. All faiss GPU code strives to be allocation-free on the GPU, assuming temporary state (intermediate results of calculations and the like) can fit into the scratch space. The temporary space reservation can be adjusted to an amount of GPU memory and even set to 0 bytes via the setTempMemory method.
There are broadly two classes of memory allocations in GPU Faiss: permanent and temporary. Permanent allocations are retained for the lifetime of the index, and are ultimately owned by the index.
Temporary allocations are made out of a memory stack that GpuResources allocates up front, which falls back to the heap (cudaMalloc) when the stack size is exhausted. These allocations do not live beyond the lifetime of a top level call to a Faiss index (or at least, on the GPU they are ordered with respect to the ordering stream, and once all kernels are done on the stream to which all work is ordered, then that temporary allocation is no longer needed and can be reused or freed. Generally about 1 GB or so of memory should be reserved in this stack to avoid cudaMalloc/Free calls during many search operations.
If the scratch memory is too small, you may notice slowdowns due to cudaMalloc and cudaFree. The high water mark used of the scratch space can be inquired from the resources object, and so can be adjusted to suit actual needs.
For those who know Chinese, this blog clearly explains how to solve the ``out of memory" problem. For English users, please check the following code:
nlist = 100 m = 8 # number of bytes per vector k = 4 quantizer = faiss.IndexFlatL2(d) # this remains the same index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8) # 8 specifies that each sub-vector is encoded as 8 bits index.train(xb) index.add(xb) D, I = index.search(xb[:5], k) # sanity check print(I) print(D) index.nprobe = 10 # make comparable with experiment above D, I = index.search(xq, k) # search print(I[-5:])
The blog referenced is just a translation of the official document.
Summary
Platform
OS: Ubuntu 19.04
Faiss version: Conda 1.5.1
Faiss compilation options:
Running on:
Interface:
Reproduction instructions
I'm getting repeatable memory errors using GPUs with 2xRTX 2080Tis. Is there an option I can set to get things moving? ```python def nearest_neighbours(query_vecs, reference_vecs, resources, top_k=100): nn_index = faiss.IndexFlatIP(reference_vecs.shape[1]) nn_index = faiss.index_cpu_to_all_gpus(nn_index) nn_index.add(reference_vecs) distances, indexes = nn_index.search(x=query_vecs, k=top_k) nn_index.reset() return distances, indexes ``` ``` terminate called after throwing an instance of 'faiss::FaissException' what(): Error in void faiss::gpu::allocMemorySpaceV(faiss::gpu::MemorySpace, void**, size_t) at gpu/utils/MemorySpace.cpp:27: Error: 'err == cudaSuccess' failed: failed to cudaMalloc 8998592512 bytes (error 2 out of memory) ```