Extract and store document/chunk structure and relationships

QuivrHQ / quivr

Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.

https://core.quivr.com

Other

36.74k stars 3.59k forks source link

Extract and store document/chunk structure and relationships #3450

Open jacopo-chevallard opened 3 weeks ago

jacopo-chevallard commented 3 weeks ago

Currently, document chunks are stored individually into our vector database (PGVector), i.e. the only relationship we record is the one between a chunk and its original document.

We should expand this to extract the document layout (headers, footers, table, image, caption, …) and the relationships (chunk --> page --> file, previous_chunk --> chunk --> next_chunk, …) and store them into a database, see our scheme.

linear[bot] commented 3 weeks ago

CORE-278 Extract and save document/chunk structure and relationships