AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
2.79k stars 196 forks source link

Save the ColBERT encodings to disk. #237

Open Diegi97 opened 1 month ago

Diegi97 commented 1 month ago

I have a use case where I run ColBERT on CPU on a couple thousand documents. For this I don't use PLAID but the encode and search_encoded_docs methods and the search works fast enough, the problem is that encoding all these documents on CPU takes time and I don't want to encode everything everytime I deploy the model so I developed a way for saving and loading these encodings:

https://github.com/ChatFAQ/ChatFAQ/blob/cc19e4b85198062888d6320e59276db31461f4e9/chat_rag/chat_rag/retrievers/colbert_retriever.py#L163

If interested I could improve and integrate this into the RAGPretrainedModel or ColBERT classes and make a PR.

faezs commented 1 month ago

I'd like this, same workflow as you and similar solution but having it be built in would be great. maybe having it be compatible with overwrite_index for cache invalidation would also be a good idea?

bclavie commented 1 month ago

This is coming as part of the overhaul I semi-announced on twitter (just on twitter, to stay lowkey...)

I have no exact ETA but these features will be available on the overhaul branch (which isn't installable right now as it'll crash, but will be very soon) within the next couple weeks.

If you have just ~2k documents and want to improve latency, the best way forward will most likely to use the HNSW index that'll ship as the native indexing mechanism for any collections under ~5k documents. It gets performance more or less matching exact search while being quite a bit quicker. Otherwise, something pretty similar to your mechanism will be added for loading/saving in-memory encodings.

Thanks for your interest!