Closed liamvdv closed 4 months ago
Is there something significantly wrong in just using IndexIVF with a single list? With nlist=1=nprobe we perform an exhaustive search. Does this hinder parallelisation or anything else?
def create_index(dimension: int, n: int) -> faiss.Index:
index = faiss.index_factory(dimension, "IVF1,Flat")
zero = np.zeros((1, dimension), dtype=np.float32)
index.train(zero)
vecs = np.random.rand(n, dimension).astype(np.float32)
index.add(vecs)
return index
dimension = 1536
n = 10_000
index = create_index(dimension, n)
faiss.write_index(index, "ivf_index.bin")
del index
# now use mmap
index = faiss.read_index("ivf_index.bin", faiss.IO_FLAG_MMAP)
# index.search(...)
# index.reconstruct_n(0, index.ntotal) ...
How about not constructing an IndexFlat, but use faiss.knn()
directly on the memory mapped data?
https://github.com/facebookresearch/faiss/blob/main/faiss/python/extra_wrappers.py#L333
Thanks for your help. That does work ;) I've now chose to use an IVF with nlist=1. They are memory-mappable ad well and keeps the remaining code the same. Any drawbacks attached to that? Thanks🙏🏻
Indeed it would be useful to support mmapped IndexFlatCodes
instances, also for use with random accesses in IndexRefine
.
In that case the codes
array would be a pointer into the mmapped data.
The mmapped data could be a raw file (like the ivfdata or OnDiskInvertedLists
) or the data from a regular Faiss index that would not be loaded completely in RAM.
Summary
Hey, I'd love to see IO_FLAG_MMAP for faiss.IndexFlat. Memory mapping would enable multiple processes to share the same memory. Specifically, I have the use case in mind where N worker processes on a single host all serve read-only, relatively small (< 100k embeddings) IndexFlat instances. In my tests the brute force search for dim upto 1024 is very acceptable, however the multiplicative memory usage across processes is not feasible. Do you think this is within scope of this project? If so, how can I help you make time for this? Thanks, Liam
Platform
OS:
Faiss version:
Installed from:
Faiss compilation options:
Running on:
Interface:
Reproduction instructions