Closed DaehanKim closed 3 months ago
x
in HNSWx,Flat
is actually M
(the maximum degrees for each layer except the lowest layer which is 2M
).
>>> import faiss
>>> index = faiss.index_factory(128, "HNSW32,Flat")
>>> index.hnsw.nb_neighbors(1)
32
>>> index.hnsw.nb_neighbors(0)
64
@KinglittleQ Thank you for the reply.
Then is the space footprint rendered 4*d + x * M * 2 * 4 = 4*d + (M^2) * 2 * 4
?
This does not seems to be correct. With M=32, one vector will use 4 * d + 32 * 32 * 2 * 4 = (4d + 8192) bytes
. If d is small enough (less than 200), it can be ignored in this equation. Then 1M vectors will cost 8192 * 1e6
bytes. So more than 8 GB?
Ok, I read the paper: "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs". In chapter 4.2.3 it listed memory cost for index is (Mmax0+mL∙Mmax)∙bytes_per_link
.
bytes_per_link
is 4, Mmax
is M, Mmax0
is max neighbors in bottom layer, so 2M, mL
is 1/ln(M). So the index should cost `4 (2 + 1/ln(M)) Mbytes. With
4 dfor the data, the final equation is
4 d + 4 (2 + 1/ln(M)) * M` bytes/vector.
If we ignore the term 1/ln(M)
("only" 0.29 for M=32), we get 4 * d + 4 * 2 * M
, x would be 1. Otherwise to be precise, x would be (2 + 1/ln(M)) / 2
.
Though I am not sure how close the implementation is according to the paper.
Summary
What is the meaning of
x
inHNSWx,Flat
? I'm confused of its space footprint4*d + x * M * 2 * 4
. I know M is the maximum number of degree on each layer and d is the dimensionality of vectors to index. But I could not get the meaning ofx
. Is it a number of layers that the algorithm is producing?