OpenPecha / toolkit-v2

OpenPecha toolkit version 2
MIT License
0 stars 0 forks source link

RAG0017: Clean Knowledge Graph Data #35

Open tenzin3 opened 1 month ago

tenzin3 commented 1 month ago

Description

Knowledge graph triples are generated by providing prompts to LLMs. Due to constraints like context length and the need for better output quality, the unstructured text is processed in smaller chunks rather than all at once. As a result, a large amount of fragmented graph data is produced. In this scenario, the processes of collating, deduplicating, and eliminating similar relations and entities become crucial to ensure accuracy and efficiency.

Image

Expected Output

A deduplicated and consolidated knowledge graph with unique entities and relations, ensuring clarity and eliminating redundancy.

Implementation Plan

tenzin3 commented 1 month ago

Methods to clean the knowledge graph

tenzin3 commented 1 month ago

Graph Schema(uncleaned):

Image

Observation: