Memory efficient training

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

https://faiss.ai

MIT License

31.12k stars 3.62k forks source link

Memory efficient training #2047

Closed rom1504 closed 3 years ago

rom1504 commented 3 years ago

I would like to train an index with a large amount of embedding in a memory constrained environment. I'm wondering what are the best ways to do it.

Currently I am using index.train(embeddings) which requires embeddings to be fully in memory. If training a large index, it can means having tens of GBs in memory, which eventually reaches the machine limits.

Are there some ways to avoid loading all the training embeddings in memory ?

Those are the ideas I have at the moment:

using a memory mapping numpy array for training embeddings : may work depending on how the training accesses the embedding, especially if using a SSD drive
using directly the kmeans implementation of faiss to avoid loading all the training embeddings in memory
using other implementations of kmeans (for example pyspark mllib one maybe) to avoid loading all the embeddings in memory

I would be interested to know if you have any advise on the topic, thanks!

mdouze commented 3 years ago

First, it seems suspicious that the memory for training vectors is the limiting factor, because if you maintain a reasonable ratio of

# training vectors / # centroids

(this is normally between 50 and 1000)

Then the CPU cost will almost certainly dominate.

The memory mapping would work, even from Python (load vectors with np.memmap)

rom1504 commented 3 years ago

Thanks for your answer and advices!

About the cpu/memory cost :

If using for example 1M of centroids, the memory need would be 50*10^6*512*4/(10^9) = 100GB for embeddings of size 512

the cpu cost to train with 1M of centroids would definitely be high, but that's only a time constraint whereas the memory constraint can be a blocker. GPU do not have 100GB of vram, so that definitely would be a blocker there too (and I saw in your benchmarks you trained up to 4M of centroids in the 1G embeddings setup (https://github.com/facebookresearch/faiss/wiki/Indexing-1G-vectors#1b-datasets) so I guess you found a solution for this case ?)

I'm working with 400M embeddings (and soon more), so increasing the number of centroids would I think help. (I'm only using 131072 centroids for now)

I will check if the memory mapping can work efficiently for this.

mdouze commented 3 years ago

At clustering time, the GPU does not need to store the training set, only the centroids, so that is not a blocker.

hustnn commented 1 year ago

At clustering time, the GPU does not need to store the training set, only the centroids, so that is not a blocker.

Hi @mdouze May I know why training set is not required, from https://github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization#clustering, the training set is the parameter of kmeans.train(x), Thanks.