facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.78k stars 3.59k forks source link

Memory consumption increased when searching with multiple threads #1108

Closed Melon-Water closed 4 years ago

Melon-Water commented 4 years ago

Summary

I used index "IDMap,Flat" in my multi-thread program with lock algorithm. It ran well  when I used only one thread. But when searching with multiple threads, the memory consumption increased significantly under high query QPS. The memory consumption tended to be stable finally. After testing, the search operation "index->search" seemed be the cause. By the way, searching with an empty index had the same problem. The memory increment always happened just at the time point when I started the stress testing, and may not change during the test, as well as the next test. My multi-thread program ran well before I got faiss index involved. I have checked my codes but found no clues. What's the possible cause?

Status VIRT RES TIME
initialization 5G 2.1G 0
increasing 76.3G 9.1G 251min
stable 77.1G 9.2G 1016min
stable 77.1G 9.2G 1643min

Platform

OS: Red Hat 6.4.0-1

Running on:

Interface:

Reproduction instructions

sgjurano commented 4 years ago

Observed similar behavior with index IVF262144_HNSW32,PQ64 after switching to ThreadPoolExecutor (python3.7, faiss 1.5.3).

mdouze commented 4 years ago

Faiss' internal multi-threading is performed with OpenMP. When a new thread is started outside of Faiss, OpenMP initializes some data structures. It was reported previously that this initialization has non-negligible performance impact when the threads are short-lived (eg. for a single search). The memory impact may be due to OpenMP as well. Therefore, it may be useful to compile Faiss without OpenMP support (remove the -fopenmp compile flag).

sgjurano commented 4 years ago

I can't disable OpenMP completely because I still use it for insert/remove operations parallelization :/

With IVF262144_HNSW32,PQ64 I have 80 bytes per vector at start and about 182 bytes per vector in stable state, I assume that the difference is because of two factors: 1) dynamic memory allocation for vectors in Inverted File (growth from 80 to 135 bytes per vector); 2) OpenMP overhead discussed in this issue (growth from 135 to 182 bytes per vector).

Can you confirm or deny these assumptions?

mdouze commented 4 years ago

Both are possible. It would still be useful to try with a non-openmp build to see if openmp introduces a significant overhead.

I am not sure it is relevant to express the size per vector. The size per vector is 64 + 8 = 72 bytes. The rest is overhead. It is not obvious that this overhead is proportional to the number of stored vectors.

Melon-Water commented 4 years ago

Faiss' internal multi-threading is performed with OpenMP. When a new thread is started outside of Faiss, OpenMP initializes some data structures. It was reported previously that this initialization has non-negligible performance impact when the threads are short-lived (eg. for a single search). The memory impact may be due to OpenMP as well. Therefore, it may be useful to compile Faiss without OpenMP support (remove the -fopenmp compile flag).

In my case, I made some further experiments and found out the "internal multi-threading with OpenMP" was to be blamed for the increasing memory consumption. In my situation, when doing search, there could be totally N threads outside Faiss. When one of them is newly started, the initialization process will create M-1 internal threads for search. As it is mentioned, each of those internal threads will take some data structures, whose size depends on the max stack size for a thread on your machine (use ulimit -s to see - 8192KB for mine). This can explain the change of VIRT.

I have N=M=96, and the M can be set through OMP_NUM_THREADS, whose default value may be related to your number of CPU cores. In my searching stage, setting OMP_NUM_THREADS=1 is enough, and afterwards the memory consumption is stable from the beginning.

In the end, the memory consumption of VIRT can be roughly explained and estimated by above observation, however, I haven't figured out the change of RES. It seems that the total number of created internal threads has no absolute impact on the consumption of RES, since several times of tests can lead to the same consumption of VIRT but different RES. Can you provide more details about this (RES related)?

mdouze commented 4 years ago

You are using a flat index, right? In that case, the cause may be blocking. The blocks are 16MB https://github.com/facebookresearch/faiss/blob/master/utils/distances.cpp#L287. Blocking should be enabled only for a number of queries > 20 though.

mdouze commented 4 years ago

no activity, closing.