How to better control entity extraction and prevent hallucinating generic entities when attempting to categorize extracted information?

What is going on when the indexing process triggers a summary of a non related topic? The summaries are always the same key words: "JOHN DOE", "ELON MUSK", "NEW YORK", "JOHN SMITH", "NASA", ... and more These summary key words are totally unrelated to the books I am ingesting in to the index. It happens with many different pdf books with similar niche topics. The pdf book processed here is about out of body experiences. Could this be gpt-4o-mini hallucinating or is it lightrag related?

Processing Journeys_Out_of_the_Body.pdf: 100%|██████████| 150/150 [00:24<00:00, 6.06it/s] INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 32 docs INFO:lightrag:[New Chunks] inserting 32 chunks INFO:lightrag:Inserting 32 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]… INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

⠹ Processed 32 chunks, 56 entities(duplicated), 19 relations(duplicated) INFO:lightrag:Inserting 55 vectors to entities INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 19 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2268 nodes, 1122 edges INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 32 docs INFO:lightrag:[New Chunks] inserting 32 chunks INFO:lightrag:Inserting 32 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]... INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

DEBUG:lightrag:Trigger summary: "JOHN DOE" <---- ⠹ Processed 32 chunks, 54 entities(duplicated), 27 relations(duplicated) DEBUG:lightrag:Trigger summary: "ELON MUSK" <---- INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:lightrag:Inserting 53 vectors to entities INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 27 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2295 nodes, 1141 edges INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 32 docs INFO:lightrag:[New Chunks] inserting 32 chunks INFO:lightrag:Inserting 32 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]... INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" ⠹ Processed 32 chunks, 47 entities(duplicated), 6 relations(duplicated) INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" DEBUG:lightrag:Trigger summary: "NEW YORK" <---- INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:lightrag:Inserting 47 vectors to entities INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 6 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2321 nodes, 1146 edges INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 32 docs INFO:lightrag:[New Chunks] inserting 32 chunks INFO:lightrag:Inserting 32 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]... INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" DEBUG:lightrag:Trigger summary: "JOHN SMITH" <---- ⠹ Processed 32 chunks, 59 entities(duplicated), 33 relations(duplicated) INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:lightrag:Inserting 58 vectors to entities INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 33 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2358 nodes, 1172 edges INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 32 docs INFO:lightrag:[New Chunks] inserting 32 chunks INFO:lightrag:Inserting 32 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]... INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" ⠹ Processed 32 chunks, 46 entities(duplicated), 20 relations(duplicated) INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" DEBUG:lightrag:Trigger summary: "NASA" <---- INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:lightrag:Inserting 44 vectors to entities INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 20 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2384 nodes, 1184 edges INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 32 docs INFO:lightrag:[New Chunks] inserting 32 chunks INFO:lightrag:Inserting 32 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]... INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" ⠹ Processed 32 chunks, 81 entities(duplicated), 44 relations(duplicated) INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" DEBUG:lightrag:Trigger summary: "JOHN DOE" <---- DEBUG:lightrag:Trigger summary: "NEW YORK CITY" <---- INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:lightrag:Inserting 72 vectors to entities INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 44 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2488 nodes, 1226 edges INFO:lightrag:Creating a new event loop in a sub-thread. INFO:lightrag:[New Docs] inserting 25 docs INFO:lightrag:[New Chunks] inserting 25 chunks INFO:lightrag:Inserting 25 vectors to chunks INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:[Entity Extraction]... INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" INFO:lightrag:Inserting 41 vectors to entities ⠴ Processed 25 chunks, 41 entities(duplicated), 13 relations(duplicated) INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Inserting 13 vectors to relationships INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" INFO:lightrag:Writing graph with 2512 nodes, 1238 edges Processing PDFs: 50%|█████ | 8/16 [21:12<24:04, 180.62s/it]

HKUDS / LightRAG

How to better control entity extraction and prevent hallucinating generic entities when attempting to categorize extracted information? #191