AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
3.03k stars 206 forks source link

How to index collection using generator function? #220

Open shubham526 opened 5 months ago

shubham526 commented 5 months ago

I have a large collection of 16 million passages that I want to index. It's not practical to keep all documents and ids in memory as a list to pass it to the index function. Is there a way to index large collections using generators?