facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.15k stars 3.53k forks source link

Out of Memory Error when running on GPU #3049

Open meghbhalerao opened 11 months ago

meghbhalerao commented 11 months ago

Summary

Hi, I want to do exact NN search for 11M samples and 512 features. Hence I have a feature matrix of 11M x 512

Platform

   Static hostname: u124281
         Icon name: computer-server
           Chassis: server
        Machine ID: 1a347b1c907c42bb81d003b8876d5b8b
           Boot ID: eb89be2b096e4d43a95e77b6b9bc735d
  Operating System: Ubuntu 20.04.4 LTS
            Kernel: Linux 5.4.0-117-generic
      Architecture: x86-64

GPU specifications are the following, obtained via nvidia-smi command -

Mon Sep 11 00:39:34 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   27C    P0    60W / 300W |   6381MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   67C    P0   295W / 300W |  30068MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100 80G...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   27C    P0    59W / 300W |      7MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100 80G...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   27C    P0    44W / 300W |      7MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A   1410357      C   python                            411MiB |
|    0   N/A  N/A   1865828      C   ...a/envs/pytorch/bin/python     5963MiB |
|    1   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A   1410357      C   python                          30061MiB |
|    2   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Faiss version:

faiss-gpu                 1.7.3           py3.9_h28a55e0_0_cuda11.3    pytorch
libfaiss                  1.7.3           hfc2d529_0_cuda11.3    pytorch

Installed from:  conda 23.1.0

Faiss compilation options:

Running on: GPU Interface: Python

Reproduction instructions

Following is the minimum working example to reproduce the issue -

Code for minimum working example - this is just using a random matrix, but in the real codebase we of course use a matrix of features -

Summary

Hi, I want to do exact NN search for 11M samples and 512 features. Hence I have a feature matrix of 11M x 512

Platform

   Static hostname: u124281
         Icon name: computer-server
           Chassis: server
        Machine ID: 1a347b1c907c42bb81d003b8876d5b8b
           Boot ID: eb89be2b096e4d43a95e77b6b9bc735d
  Operating System: Ubuntu 20.04.4 LTS
            Kernel: Linux 5.4.0-117-generic
      Architecture: x86-64

GPU specifications are the following, obtained via nvidia-smi command -

Mon Sep 11 00:39:34 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   27C    P0    60W / 300W |   6381MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   67C    P0   295W / 300W |  30068MiB / 81920MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100 80G...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   27C    P0    59W / 300W |      7MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100 80G...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   27C    P0    44W / 300W |      7MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A   1410357      C   python                            411MiB |
|    0   N/A  N/A   1865828      C   ...a/envs/pytorch/bin/python     5963MiB |
|    1   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A   1410357      C   python                          30061MiB |
|    2   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      1547      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Faiss version:

faiss-gpu                 1.7.3           py3.9_h28a55e0_0_cuda11.3    pytorch
libfaiss                  1.7.3           hfc2d529_0_cuda11.3    pytorch

Installed from:  conda 23.1.0

Faiss compilation options:

Running on: GPU Interface: Python

Reproduction instructions

Following is the minimum working example to reproduce the issue -

Code for minimum working example - this is just using a random matrix, but in the real codebase we of course use a matrix of features -

import faiss
import numpy as np
import logging

logger = logging.getLogger(__name__)
logging.basicConfig(format="%(asctime)s %(levelname)-8s %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S")
# 11060223
data = np.random.rand(1000, 512).astype("float32")
logger.info(f"Data loaded. Shape = {data.shape}")
num_columns = data.shape[1]
faiss.omp_set_num_threads(faiss.omp_get_max_threads() - 1)
logger.info(f"Using {faiss.omp_get_max_threads() - 1} threads.")
cpu_index = faiss.IndexFlatL2(num_columns)
k_nearest_neighbors = 2047
number_of_gpus = faiss.get_num_gpus()
logger.info(f"Running on {number_of_gpus} GPUs.")
index = faiss.index_cpu_to_all_gpus(cpu_index)
index.add(data)
logger.info("Finding nearest neighbors.")
similarities, indices = index.search(data, k_nearest_neighbors)

Logs -

[2023-09-11 00:33:33,277][faiss.loader][INFO] - Loading faiss with AVX2 support.
[2023-09-11 00:33:33,296][faiss.loader][INFO] - Successfully loaded faiss with AVX2 support.
Faiss sparse similarity/distance matrix does not exist - hence computing it!
[2023-09-11 00:33:33,300][utils.faiss_mat][INFO] - Loading data.
[2023-09-11 00:34:06,923][utils.faiss_mat][INFO] - Data loaded. Shape = (11060223, 512)
[2023-09-11 00:34:06,924][utils.faiss_mat][INFO] - Loading landmarks.
[2023-09-11 00:34:49,933][utils.faiss_mat][INFO] - Landmarks loaded. Shape = (11060223, 512)
[2023-09-11 00:34:49,934][utils.faiss_mat][INFO] - Using 18 threads.
[2023-09-11 00:34:49,934][utils.faiss_mat][INFO] - Running on 4 GPUs.
[2023-09-11 00:35:50,087][utils.faiss_mat][INFO] - Not normalizing matrix as metric being used is simeuclid
[2023-09-11 00:35:59,110][utils.faiss_mat][INFO] - Finding nearest neighbors.
Error executing job with overrides: ['seed=5', 'use_ffcv=false', 'dataset=imagenet21k', 'batch_size=256', 'num_classes=10450', 'phase=calibration', 'summary_parameters.fraction=0.001', '+use_gpu_faiss=true', '+faiss_knn=2047', 'summary_parameters.feat_type_list=[clip_vit_b_32]', 'summary_parameters.feat_mode_list=[activation]', 'summary_parameters.sparse_type=zcopblock_precalc_sparse', 'recalculate_params.sparsification_clustering=true', 'summary_parameters.smraiz_constrains=[partition_matroid]', 'summary_parameters.fn_type=smraiz', 'summary_parameters.sim_type=simeuclid', 'summary_parameters.use_sparse_representation=true', 'summary_parameters.feat_responsibilities=[10]', 'submod_max_algo.type=stochastic_greedy', 'submod_max_algo.eps=1e-10', 'submod_max_algo.log_iter=1000', 'eval_mode=semantic_softmax', 'num_eval=1', 'use_saved_data=false', 'summarization_strategy=whole', 'summary_parameters.knn=1000', 'root_data_dir=/data/megh98/projects/datasets/imagenet21k/imagenet21k_resized/']
Traceback (most recent call last):
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/main.py", line 286, in main
    images, subset_labels, summary_elements = summarize.get_summary()
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/summary.py", line 95, in get_summary
    summary_elements = smraiz_obj.get_smraiz_summary()
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_sum.py", line 324, in get_smraiz_summary
    self.similarity_matrix, self.sim_filename = self.get_similarity_mat(feat_type, feat_mode, feat_basename)
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_sum.py", line 179, in get_similarity_mat
    self.sim_filename = make_sim_or_dist_dist_file(filename=self.sim_filename, sim_type=self.sim_type, dname=self.dname, config_dict=self.config_dict, knn_k = self.knn_k, use_gpu_faiss=self.use_gpu_faiss, feat_filename=self.feat_filename)
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_utils.py", line 108, in make_sim_or_dist_dist_file
    filename = construct_simmat_faiss(filename, faiss_knn, feat_filename, sim_type, use_gpu_faiss)
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/summarizers/smraiz_utils.py", line 134, in construct_simmat_faiss
    sim_filename_faiss = construct_sparse_similarity_matrix(data_file=feat_filename, landmark_file=feat_filename, output_file=sim_filename_faiss, k_nearest_neighbors=faiss_knn, metric=sim_type, use_gpu=use_gpu_faiss)
  File "/data/megh98/projects/dev_folder/smrai-container-documentation/src/utils/faiss_mat.py", line 166, in construct_sparse_similarity_matrix
    similarities, indices = index.search(data, k_nearest_neighbors)
  File "/home/megh98/anaconda3/envs/imagenet/lib/python3.9/site-packages/faiss/class_wrappers.py", line 343, in replacement_search
    self.search_c(n, swig_ptr(x), k, swig_ptr(D), swig_ptr(I), params)
  File "/home/megh98/anaconda3/envs/imagenet/lib/python3.9/site-packages/faiss/swigfaiss_avx2.py", line 9400, in search
    return _swigfaiss_avx2.IndexReplicas_search(self, n, x, k, distances, labels, params)
RuntimeError: Exception thrown from index 0: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 0 space Device stream 0xb782520 size 45280557056 bytes (cudaMalloc error out of memory [2])

Exception thrown from index 1: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 1 space Device stream 0x83679830 size 45280557056 bytes (cudaMalloc error out of memory [2])

Exception thrown from index 2: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 2 space Device stream 0xa01bef50 size 45280557056 bytes (cudaMalloc error out of memory [2])

Exception thrown from index 3: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 3 space Device stream 0xbcb1b640 size 45280540928 bytes (cudaMalloc error out of memory [2])

Is there some way to do exact NN search, i.e. I still want to use the IndexFlatL2, but say calculate the distances in batches so that it does not give an OOM error, i.e. if that is the cause of the error in the first place? I am a little unsure of the cause of the error.

The data is of size 22GB and the GPU memory is 80GB, so I think that should not be a problem, since it fits in the RAM, so I am not sure what the problem is?

Please do let me know if I am missing anything and thank you very much for your time!

meghbhalerao commented 11 months ago

Actually, I think I might have figured out how to do it, I can just chunk my queries into batches and then loop over those chunks and do index.search(chunk, kneighbors), which would go something like -

index.add(whole_data)
chunks = chunk_data(whole_data, num_chunks)
for chunk in chunks:
    index.search(chunk, kneighbors)
mdouze commented 11 months ago

yes please do. We recently introduced batching for large GPU queries but it is not available everywhere.

gajghatenv commented 7 months ago

@meghbhalerao where did you find docs for chunking?

meghbhalerao commented 7 months ago

I don't think I used any docs. The code which I used above simply does the matrix matrix multiply in chunks rather than all at once.

gajghatenv commented 7 months ago

I am using the chunk of code you had in your response and I still get the error from your original post

meghbhalerao commented 7 months ago

You might have to reduce the chunk size, maybe? So that it fits in the memory of your GPU? or it might be just that your index might be too large?

On Wed, Jan 10, 2024 at 1:49 PM gajghatenv @.***> wrote:

I am using the chunk of code you had in your response and I still get the error from your original post

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/faiss/issues/3049#issuecomment-1885793845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH5KGEBCMPXIXCAQ4HMUXDDYN4EF7AVCNFSM6AAAAAA4SUJX5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBVG44TGOBUGU . You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks & Regards, Megh Bhalerao B.Tech in Electrical & Electronics Engineering Homepage: https://meghbhalerao.github.io https://meghbhalerao.github.io