AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
3.08k stars 210 forks source link

Add support for 'quantize_embeddings' #181

Open paulthemagno opened 8 months ago

paulthemagno commented 8 months ago

I'm looking for quantize the embeddings to speed up the process of indexing / searching / etc.

For example in sentence_transformers there is quantize_embeddings:

binary_embeddings = quantize_embeddings(embeddings, precision="binary" | "int8")

https://sbert.net/examples/applications/embedding-quantization/README.html

Is anything similar already available? Or is it possible to add it for the usage of ColBERT in RAGatouille?

bclavie commented 7 months ago

Hey!

Thank you for opening the issue, there's definitely been a lot of interest in quantised embeddings.

There's potentially work on future ColBERT models for better compression, or even better indexing methods like the very recently released EMVB paper (which would be really nice to get into RAGatouille!) whose memory footprint is 1.8x smaller than the current PLAID indexing method.

However, for the general idea of quantisation -- it's actually already used by the PLAID index! Right now, we default to this compression logic to choose how aggressive the quantisation will be:

nbits = 2
if len(collection) < 5000:
    nbits = 8
elif len(collection) < 10000:
    nbits = 4

but I think it'd make sense to expose this parameter to users, so you could even choose nbits=1 to quantise down as much as possible!