HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.22k stars 1.13k forks source link

Didn't extract any relationships with gpt_4o_mini_complete, working with gpt_4o_mini_complete #301

Open rcoundon opened 3 days ago

rcoundon commented 3 days ago

When using gpt_4o_complete to create the knowledge graph I'm seeing the warning: WARNING:lightrag:Didn't extract any relationships, maybe your LLM is not working

Instantiation looks like this:

rag = LightRAG(
            working_dir=working_dir,
            llm_model_func=gpt_4o_complete, 
            graph_storage="Neo4JStorage",
            log_level="INFO",
        )

I don't see the same warning when using gpt_4o_mini_complete

The app is creating a knowledge graph for chunks of some markdown files (originally converted from PDF) Any thoughts on what could be causing this?

rcoundon commented 3 days ago

I'm not sure it's related, but when using the mini there is still a warning but it's seemingly unrelated:

WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.UnknownLabelWarning} {category: UNRECOGNIZED} {title: The provided label is not in the database.} {description: One of the labels in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing label name is: NCRNCSNCT)} {position: line: 1, column: 10, offset: 9} for query: 'MATCH (n:NCRNCSNCT) RETURN n'

I'm not actually issuing queries at this point, just creating the KGs for my docs.

LarFii commented 3 days ago

You should check the output from the LLM during the extraction process. You can directly review the cache files to see if the output is as expected. This can help determine whether the issue lies in the LLM's response or elsewhere in the pipeline.

rcoundon commented 2 days ago

You should check the output from the LLM during the extraction process. You can directly review the cache files to see if the output is as expected. This can help determine whether the issue lies in the LLM's response or elsewhere in the pipeline.

Ok, thanks, I'll take a look and report back

rcoundon commented 2 days ago

I've had a look at kv_store_llm_response_cache.json and vdb_relationships.json and there's a fair amount of data there but I'm not sure what I'm looking at.

However, on the response cache I see this:

Given the technical and largely descriptive nature of the provided text, it's challenging to identify traditional entities like organizations, persons, geolocations, or events as defined by the constraints. However, I can focus on certain elements like terms related to the overall process described in the content:

  1. Entity Identification: None of the traditional entities (organization, person, geo, event) are clearly specified in the text provided.

  2. Relationships: Lacking traditional entities, there are no clear relationships to be defined among any potential entities.

However, focusing purely on elements present within the text, I can attempt to identify concepts or technical terms that may act as placeholders:

("entity"<|>"Component Mounting"<|>"event"<|>"The process of fixing components to the wall across various languages, depicted through diagrams and imagery.")##
("entity"<|>"Suction"<|>"event"<|>"Details about the aspiration or suction process via different alignments, like light shaft.")##
("entity"<|>"Aspiration via Light Shaft"<|>"event"<|>"Technical specifications regarding how aspiration is carried out using a light shaft in the mounting process.")##
("entity"<|>"Technical Diagrams"<|>"event"<|>"Imagery used to illustrate the process of component mounting and air control techniques.")##

Since the text does not contain clear, traditional entities, and relationships, the extraction remains limited to the terms and processes identified in the text. If there are further specific details or entities within additional context or a different section of text, please provide more clarity to enable a more refined extraction.

Which I suspect is the source of the warning I reported. Would you agree? If so, is there a way to guide the LLM when initiating this process on how to establish entities and relationships?

LarFii commented 1 day ago

Yes, I think that is the root cause of the issue. Modifying the entity types in prompt.py might help resolve it.