aws-samples / Serverless-Retrieval-Augmented-Generation-RAG-on-AWS

A full-stack serverless RAG workflow. This is thought for running PoCs, prototypes and bootstrap your MVP.
MIT No Attribution
50 stars 19 forks source link

Feature: Introducing Kúzu graph database for extra-planar relationships #13

Open giusedroid opened 5 months ago

giusedroid commented 5 months ago

Kúzu is an embedded graph database. We want to explore graph-RAG capabilites to include more relevant information when retrieving semantically.

A good MVP for this would be mapping obvious relationships at ingestion that we cannot store semantically as vectors, for example

page -> next() : page
page -> previous() : page
page -> belongsToDocument() : document
page -> sectionStartsAt() : page
page -> sectionEndsAt() : page
document -> relatesToDocument() : document[]
document -> belongsToCollection() : document[] 
document -> abstract() : string

and at semantic retrieval use the relationships mapped for the retrieved vectors to provide additonal context or exclude other vectors from context placement if they are not related to the most relevant collection. Probably using them in the context of re-ranking.

giusedroid commented 1 month ago

As a business user I want to make sure that I am only sourcing information from a collection of documents, so that the risk of hallucination by mixing non-retalted documents is minimized.

For example, with purely semantic/vectorial retrieval we've had hallucinations by confusing chunks that belong to different documents. A notable result is that the LLM combines knowledge from two different business entities "Amazon's new CEO is Barack Obama" because it loaded up in context two documents which were referring to business policies.

luke-b commented 1 month ago
RAG (Retrieval-Augmented Generation) system - expanded