Feature: Introducing Kúzu graph database for extra-planar relationships

giusedroid commented 5 months ago

Kúzu is an embedded graph database. We want to explore graph-RAG capabilites to include more relevant information when retrieving semantically.

A good MVP for this would be mapping obvious relationships at ingestion that we cannot store semantically as vectors, for example

page -> next() : page
page -> previous() : page
page -> belongsToDocument() : document
page -> sectionStartsAt() : page
page -> sectionEndsAt() : page
document -> relatesToDocument() : document[]
document -> belongsToCollection() : document[] 
document -> abstract() : string

and at semantic retrieval use the relationships mapped for the retrieved vectors to provide additonal context or exclude other vectors from context placement if they are not related to the most relevant collection. Probably using them in the context of re-ranking.

giusedroid commented 1 month ago

As a business user I want to make sure that I am only sourcing information from a collection of documents, so that the risk of hallucination by mixing non-retalted documents is minimized.

For example, with purely semantic/vectorial retrieval we've had hallucinations by confusing chunks that belong to different documents. A notable result is that the LLM combines knowledge from two different business entities "Amazon's new CEO is Barack Obama" because it loaded up in context two documents which were referring to business policies.

luke-b commented 1 month ago

RAG (Retrieval-Augmented Generation) system - expanded

aws-samples / Serverless-Retrieval-Augmented-Generation-RAG-on-AWS

Feature: Introducing Kúzu graph database for extra-planar relationships #13