Open giusedroid opened 5 months ago
As a business user I want to make sure that I am only sourcing information from a collection of documents, so that the risk of hallucination by mixing non-retalted documents is minimized.
For example, with purely semantic/vectorial retrieval we've had hallucinations by confusing chunks that belong to different documents. A notable result is that the LLM combines knowledge from two different business entities "Amazon's new CEO is Barack Obama" because it loaded up in context two documents which were referring to business policies.
Kúzu is an embedded graph database. We want to explore graph-RAG capabilites to include more relevant information when retrieving semantically.
A good MVP for this would be mapping obvious relationships at ingestion that we cannot store semantically as vectors, for example
and at semantic retrieval use the relationships mapped for the retrieved vectors to provide additonal context or exclude other vectors from context placement if they are not related to the most relevant collection. Probably using them in the context of re-ranking.