bclavie / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
2.45k stars 173 forks source link

How to index collection using generator function? #220

Open shubham526 opened 3 weeks ago

shubham526 commented 3 weeks ago

I have a large collection of 16 million passages that I want to index. It's not practical to keep all documents and ids in memory as a list to pass it to the index function. Is there a way to index large collections using generators?