dense retrieval from multiple index shard

MXueguang commented 3 years ago

its good to support dense retrieval from multiple faiss index shard. this would be more friendly for machine with limited RAM. (and further support GPU retrieval)

MXueguang commented 3 years ago

this also means it is not a must to merge faiss indexes after doing encoding.

lintool commented 3 years ago

Agreed. Seems like a good URA task.

MXueguang commented 2 years ago

@nimasadri11 will work on this. start with reproducing the TCT_Colbert results here? and then split the index into shard and implement the retrieval from multiple sub-index.

w32zhong commented 2 years ago

@nimasadri11 FYI.

I saw a wiki page on suggestions to handle large indexes. It mentions we need to first perform training on all samples without adding any actual data, then distribute these "empty" index files across machines or processes, and then add data independently:

The general approach is to:

train the index (typically an IVF index), without adding data to it. Store the trained empty index.

split the database into parts. Load the trained index and add the parts independently. This can be done independently on several machines.

I've also found a reference implementation from the original Colbert repo for this (but indexing into one faiss index), see:

https://github.com/stanford-futuredata/ColBERT/blob/49d7fb69bdb2d7340780bad2926b06f6ce14d928/colbert/indexing/faiss.py#L41: it loads all samples for training before adding actual data
https://github.com/stanford-futuredata/ColBERT/blob/49d7fb69bdb2d7340780bad2926b06f6ce14d928/colbert/indexing/faiss.py#L89-L108: it uses multi-thread (although we could argue multiprocess can work faster in Python due to GIL) to load actual data from different partitions/shards/slices (from slice_parts_paths) and add them all into one faiss index.

MXueguang commented 2 years ago

@t-k- thanks for the info. in this PR, we are trying to deal with brute force search (flat index) first. (and flat index with GPU) i.e. the "search" stage. The "general approach above" is regarding "index"? I'll create a separate issue for the index part and link the above info to that.

do we really need multiple faiss indexes here if it is not flat flat index is still a common use case I believe? as it gives the upper bound of the effectiveness of a dense retriever model.

compression is also regarding the "index" stage. will put in the other issue too.

@nimasadri11 for now, focus on "search" with prepared brute-force index is enough. the purpose is trying to make people with limited RAM able to do brute-force searches on (CPU/GPU). we will deal with the index stage (train / compression etc) after that.

the above info is very helpful, I'll create a separate issue to track.

w32zhong commented 2 years ago

thanks, @MXueguang, I was just adding what I learned recently, I hope it helps in some way. Not intending to steer the direction.

If the whole point here is to cope with the machine memory limit, then the bottleneck is usually loading entire embedding data into memory, in that case, we can just split embedding data and add them separately. For flat faiss index, it is indeed useful to split faiss index files.

castorini / pyserini

dense retrieval from multiple index shard #798