Open paulthemagno opened 8 months ago
Hey!
Thank you for opening the issue, there's definitely been a lot of interest in quantised embeddings.
There's potentially work on future ColBERT models for better compression, or even better indexing methods like the very recently released EMVB paper (which would be really nice to get into RAGatouille!) whose memory footprint is 1.8x smaller than the current PLAID indexing method.
However, for the general idea of quantisation -- it's actually already used by the PLAID index! Right now, we default to this compression logic to choose how aggressive the quantisation will be:
nbits = 2
if len(collection) < 5000:
nbits = 8
elif len(collection) < 10000:
nbits = 4
but I think it'd make sense to expose this parameter to users, so you could even choose nbits=1
to quantise down as much as possible!
I'm looking for quantize the embeddings to speed up the process of indexing / searching / etc.
For example in
sentence_transformers
there is quantize_embeddings:https://sbert.net/examples/applications/embedding-quantization/README.html
Is anything similar already available? Or is it possible to add it for the usage of ColBERT in RAGatouille?