facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.47k stars 3.64k forks source link

Memory leak: IDSelectorBatch + SearchParameters #2996

Open vaaliferov opened 1 year ago

vaaliferov commented 1 year ago

Summary

I'm trying to use IDSelectorBatch + SearchParameters, but it seems like there is a memory leak somewhere.

Platform

OS: Ubuntu 20.04.1 Faiss version: faiss-cpu 1.7.4

Reproduction instructions

import gc
import faiss
import numpy as np

for _ in range(10):
    subset = np.arange(0, 5000000)
    sel = faiss.IDSelectorBatch(subset)
    params = faiss.SearchParameters(sel=sel)
    mem_usage = faiss.get_mem_usage_kb() / 1024 ** 2
    print(round(mem_usage, 2), end=' ')
    gc.collect()

Output

0.34 0.57 0.79 1.02 1.24 1.47 1.69 1.92 2.14 2.37
mdouze commented 1 year ago

Thanks for the clean bug report...

mdouze commented 1 year ago

I can repro

mdouze commented 1 year ago

This test here is an error, since the values are not classes but instances. https://github.com/facebookresearch/faiss/blob/main/faiss/python/class_wrappers.py#L1084C28-L1084C28 however, it does not explain the memory leak.

mdouze commented 1 year ago
subset = np.arange(0, 5000000)
sel = faiss.IDSelectorBatch(subset)
sel.this.own()    # True: correct 
params = faiss.SearchParameters(sel=sel)
sel.this.own()   # False: why???
mdouze commented 1 year ago

see https://github.com/facebookresearch/faiss/pull/3007

vaaliferov commented 1 year ago

@mdouze, thank you !

dbalabka commented 1 year ago

@mdouze , thank you a lot! What is a proximate time when you will be able to release this fix?

dshkliarenko commented 1 year ago

@mdouze I have installed nightly version with the fix, and yes in the scenario from repro instructions it got fixed, however, if I use different order of creation of selector and params I get the same issue:

`import gc import faiss import numpy as np

for _ in range(10): params = faiss.SearchParameters() subset = np.arange(0, 5000000) params.sel = faiss.IDSelectorBatch(subset) mem_usage = faiss.get_mem_usage_kb() / 1024 ** 2 print(round(mem_usage, 2), end=' ') gc.collect()`

0.37 0.59 0.82 1.04 1.27 1.49 1.72 1.94 2.17 2.39

algoriddle commented 1 year ago

We're looking into this. As a temporary workaround, try calling

params.sel.this.own(True)

after the creation of IDSelectorBatch.

mdouze commented 1 year ago

Related to this SWIG issue https://github.com/swig/swig/issues/2709