AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
2.98k stars 204 forks source link

Support exporting index to HuggingFace Hub #37

Open bclavie opened 9 months ago

bclavie commented 9 months ago

Indexing is time consuming, and oftentimes people would like to be able to easily share pre-built index for various common datasets, for general domain application (wikipedia, code documentation...) and evaluation purposes.

A simple way to support this would be to add a util function that'd export the full index folder to the huggingface model, effectively exporting both the ColBERT config + the compressed vectors, allowing you to to do something like RAGPretrainedModel.from_prebuilt_index("EXAMPLE_USER/Wikipedia") and immediately begin querying the index.

sutyum commented 9 months ago

Can't wait for this! As of now just keeping zip files of the index produced by ragatouille. Will the upcomming util function essentially create repos like this: https://huggingface.co/Technoculture/guidelines-search/tree/main ? (the data here is contained in the .ragatouille file)

It would enable more projects like AgentSearch and infact we discussed this here: https://huggingface.co/datasets/SciPhi/AgentSearch-V1/discussions/3