0xPlaygrounds / rig

⚙️🦀 Build portable, modular & lightweight Fullstack Agents
https://rig.rs
MIT License
153 stars 9 forks source link

feat: Generic document and chunking interfaces #31

Open cvauclair opened 1 month ago

cvauclair commented 1 month ago

Feature Request

Provide a generic (i.e.: with traits) interface for loading and chunking documents prior to embedding, along with predefined loaders and chunkers.

Motivation

Document loading (e.g.: pdf, csv, docx, etc.) and chunking is a cornerstone of building AI agent systems, especially RAG systems.

Rig should provide a generic interface for those operations (which would allow users to customize the loading and chunking process) along with some out-of-the-box loaders and chunking strategies.

Proposal

Create new DocumentLoading and DocumentChunking traits which would be implemented for specific document loaders and chunkers, as well as a new Pipeline struct, which would implement the builder pattern to define a Loading -> Chunking -> Embedding flow.