facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.71k stars 3.66k forks source link

GPU memory usage even after deleting the index #3784

Closed JoaoGuibs closed 2 months ago

JoaoGuibs commented 3 months ago

Summary

When using faiss on the GPU (flat index in this example), if we have an GPU OOM error, even after resetting and deleting the index, we still have some GPU memory being used (nvidia-smi screenshot below). Is this expected? If yes what are the reasons for this to happen?

Thanks in advance.

Platform

OS:

Faiss version: faiss-gpu, version 1.7.2

Installed from: pip

Faiss compilation options:

Running on:

Interface:

Reproduction instructions

While running the following code, the memory usage on the breakpoint is shown below:

import torch
import faiss

def create_index(data):
    dimension = data.shape[1]
    index_flat = faiss.IndexFlatL2(dimension)
    gpu_resources = faiss.StandardGpuResources()
    gpu_options = faiss.GpuClonerOptions()
    device = torch.device("cuda")
    device_index = device.index
    gpu_options.device = device_index if device_index is not None else 0

    index = faiss.index_cpu_to_gpu(
        gpu_resources, gpu_options.device, index_flat, gpu_options
    )

    return index

data = torch.zeros((2950000, 1000))
index = create_index(data)

try:
    index.add(data)
except Exception as e:
    print(e)
    index.reset()
    del index
    breakpoint()

image

junjieqi commented 3 months ago

Hi @JoaoGuibs, thank you for reaching out. I think it is expected and delete index itself can't reclaim the resource. You should clean up the torch.cuda. Here is an example I run and I could clean up the memory usage

import torch
import gc

gc.collect()
torch.cuda.empty_cache()

Before

Screenshot 2024-08-22 at 2 36 42 PM

After

Screenshot 2024-08-22 at 2 36 16 PM
JoaoGuibs commented 3 months ago

Thanks for the reply @junjieqi . I realised I did not have added the line import faiss.contrib.torch_utils otherwise it errors with a different exception. Nonetheless I cannot reproduce your results by adding the garbage collection neither the cache clearing of torch, as there is always memory getting allocated (only if I kill the python process the memory gets cleared).

Also I note from your second screenshot that you do not have a python process running on the gpu. Are you able to not have any memory usage while the python process is running as well?

junjieqi commented 3 months ago

@JoaoGuibs I think it is more related with python lifecycle management instead of the Faiss, as you can tell if you kill the python process, the memory will get cleared. I'm not sure if the python process will continue showing in the nvidia-smi command or not. I thought it only showed when the process was ongoing. However, after I finished the run, it should not show again.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.