AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
3.03k stars 205 forks source link

Rework Dependencies: ship with barebones dependencies & bundle different features as extras #136

Open bclavie opened 8 months ago

bclavie commented 8 months ago

Putting this out there as a way to alleviate the many dependencies issues. I'll soon be shipping a PLAID (&compression, that will come later)-free indexing method, which will alleviate the need to run custom CUDA code or faiss when indexing small collections (anything up to ~2000 256 token documents can still be queried in hundreds of milliseconds on CPU).

Once this index has shipped, I am planning to overhaul dependencies, as I'm being told more and more that RAGatouille is making it into prod use cases and the "full fat" default version is kind of annoying. This is where I'm currently at in terms of versions:

ragatouille

Features: Search, In-memory encoding, uncompressed indexing Deps:

ragatouille[train]

REMOVE SENTENCE-TRANSFORMERS Features: Training, hard negative mining Additional deps:

ragatouille[all]

Features: Everything Deps:

Any feedback on this would be appreciated at this stage -- very early thoughts still! One big question is whether torch (which is required) should ship with the base version, or be optional to facilitate env compatibility.

phaistos commented 8 months ago

There are issues when adding ragatouille to a llama-index 0.10.x project. It pulls an 0.9.x artifact and some of the core namespaces get confused, e.g. you can't import LLM from core anymore. Since it doesn't seem integral to your project, perhaps you could bump it up.