RAG0012: Knowledge Graph Creation and Integration (5)

tenzin3 commented 2 weeks ago

Description

The idea is to have a knowledge graph created per book that we can additionally use for retrieval besides the vector database. We will kickstart the knowledge graph creation with the help of LLM and decide at later stage if any automated or manual steps are required for further development.

Expected Output

A graph database that can be used for retrieval.

teny19 commented 1 week ago

Created triplets for each chapter, but uncertain whether this is the right approach. When using LLMs, we don't have control over what is identified as a node and what is identified as a relation. Some of the generated triplets don't make sense to include in a knowledge graph, e.g.:

\<Author>, \<took apart movie projector>, \<to study electricity>
\<Author>, \<worked on old motor cars>, \<in Lhasa>
\<Purpose of higher education>, \<is to study>, \<five higher subjects>

The main issue is the lack of control over node and relation identification which would lead to time-consuming cleanup, and during deduplication process there is also risk of misleading or incorrect information being merged together.

teny19 commented 1 week ago

An alternative approach would be to first identify the entities in each chapter and look them up on Wikidata. If there is a match, we can use the properties and relations from Wikidata to create the graph. Wikidata has a list of properties used for all entities, so we could select a relevant subset and convert them to either node/relation properties or simply relations. This way we would end up with a property graph, which is more powerful since it allows storing properties at both the node and relation levels. To give an example:

Node 1: Lhalu Tsewang Dorje [https://www.wikidata.org/wiki/Q1913262]
- gender: male
- BDRC_ID: P6715
Node 2: Kalon [https://www.wikidata.org/wiki/Q10924321]
- BDRC_ID: R45
Relation: position held
- start_time: 1946
- end_time: 1952

Once we have a complete property graph of all the entities that appear in the book, we can still decide whether we want to extend it with the triplets generated from the LLM.

OpenPecha / RAG_chatbot_interface

RAG0012: Knowledge Graph Creation and Integration (5) #7

Description

Expected Output