OpenPecha / rag_prep_tool

MIT License
0 stars 0 forks source link

RAG0016: Populate Knowledge Graph on Graph Database #17

Open tenzin3 opened 2 months ago

tenzin3 commented 2 months ago

Description

This project involves the population of a knowledge graph within a graph database. The aim is to store triples and structured data, which represent entities and their relationships, into the graph database.

Expected Output

Implementation Plan

tenzin3 commented 2 months ago

TerminusDB was our first choice as it is a graph database that supports versioning with its data. However, their team has shifted focus to other projects, and due to the small community, we decided to move away from TerminusDB.

tenzin3 commented 2 months ago

There are many graph database options available, but very few offer a free community edition. One that does, and has the largest community in the world, is Neo4j.

Another interesting option is Memgraph, which has following features

tenzin3 commented 2 months ago

Cypher Languages Necessary codes for memgraph Lab

Show all entities: MATCH (n) RETURN n; Show all entities with relation: MATCH (n)-[r]->(m) RETURN n, r, m; Delete all data: MATCH (n) DETACH DELETE n;

tenzin3 commented 2 months ago

Graph Visualization from the Memgraph Lab

Image

Insert knowledge graph triplets

from neo4j import GraphDatabase

URI = "bolt://localhost:7687"
AUTH = ("", "")

def insert_triplets(triplets):
    with GraphDatabase.driver(URI, auth=AUTH) as driver:
        with driver.session() as session:
            for head, relation, tail in triplets:
                session.run(
                    f"MERGE (h:Entity {{name: $head}}) "
                    f"MERGE (t:Entity {{name: $tail}}) "
                    f"MERGE (h)-[:{relation}]->(t)",
                    head=head, tail=tail
                )

triplets = [
    ("DalaiLama", "WasBornIn", "Taktser"),
    ("Taktser", "isLocatedIn", "Dokham"),
    ("Dokham", "isPartOf", "Tibet"),
    ("Khampa", "LivesIn", "Dokham"),
    ("Dokham","DescendsTo","China"),
    ("DalaiLama","WasBornIn","WoodHogYear"),
    ("AmiChiri","IsSouthOf","Taktser"),
]

insert_triplets(triplets)

fetch knowledge graph triplets

from neo4j import GraphDatabase

URI = "bolt://localhost:7687"
AUTH = ("", "")

def fetch_data():
    with GraphDatabase.driver(URI, auth=AUTH) as driver:
        with driver.session() as session:
            result = session.run("MATCH (h)-[r]->(t) RETURN h.name, type(r), t.name")
            for record in result:
                print(record["h.name"], record["type(r)"], record["t.name"])

fetch_data()
tenzin3 commented 2 months ago

Graph Visualization from the Memgraph Lab

Image

Graph Data schema

Image

Insert Knowledge Graph triplets with Properties

Data is from here

from neo4j import GraphDatabase

URI = "bolt://localhost:7687"
AUTH = ("", "")

def insert_triplets(data):
    with GraphDatabase.driver(URI, auth=AUTH) as driver:
        with driver.session() as session:
            # Insert nodes
            for node in data['nodes']:
                entity_type = node["type"]  
                properties = node.get('attributes', {})
                properties['name'] = node['label']
                session.run(f"CREATE (n:{entity_type} $props)", {'props': properties})  

            # Insert edges
            for edge in data['edges']:
                source = edge['source']
                target = edge['target']
                relation = edge['relation']
                session.run(
                    f"MATCH (a {{name: $source}}), (b {{name: $target}}) "
                    f"CREATE (a)-[:{relation}]->(b)",
                    {'source': source, 'target': target}
                )

import json 

with open('kg_data.json', 'r') as file:
    data = json.load(file)

insert_triplets(data)
tenzin3 commented 2 months ago

@teny19 suggestions:> Methods to clean the knowledge graph

Test for 3-5 pages initially to test the methods and then if satisfactory then going ahead for the 1 chapter and then for whole book.