Closed lynnliu030 closed 3 years ago
That's an interesting question.
To measure the amount of data read, you can use the ndis stats field (https://github.com/facebookresearch/faiss/wiki/Implementation-notes#statistics-for-non-exhaustive-search). The amout read is ndis * (8 + code_size)
where code_size is the size of the indexIVF entries.
The batch search policy depends on parallel mode, see https://github.com/facebookresearch/faiss/blob/master/faiss/IndexIVF.h#L105
That's very helpful, thanks!
Summary
Hi, I was trying out the demo script ondisk_ivf.py, and I want to understand how this on-disk search works for a batch of query vectors. Specifically, first, I want to measure what is the amount of data being transferred from disk to memory to perform search. Second, I want to understand what's the difference between doing this sequentially (one vector by another) and doing this in a batch (i.e. calculate all inverted lists need to be accessed, and read all data from disk once); which is the method that Faiss adopts to search a batch of query vectors? Any pointers of the tools to measure the size of data being transferred? And any ideas directly about how searching batch of data works in Faiss?