Open vipervs opened 4 days ago
@vipervs this seems to be nano-graphrag specific issue. Sometime I observe that JSON community generation can be funky if not using larger LLMs (GPT4o). Please also raise your request and model configuration to https://github.com/gusye1234/nano-graphrag/.
Description
I got the following error when doing a simple QA with nano graph: Model: GPT4-o-mini
User-id: 1, can see public conversations: True Session reasoning type None Session LLM openai Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'> Reasoning state {'app': {'regen': False}, 'pipeline': {}} Thinking ... Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x306b46c50>, FSPath=PosixPath('/Users/andi/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x306b47a30>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x331cd0520>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x331cd1300>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x331f94dc0>), mmr=False, rerankers=[CohereReranking(cohere_api_key='WDIdNCKpcA7TlUc4y0IpjisPdNSPdZV8p7kXOrxI', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, userid=1), NanoGraphRAGRetrieverPipeline(DS=<theflow.base.unset object at 0x102dce230>, FSPath=<theflow.base.unset object at 0x102dce230>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset object at 0x102dce230>, VS=<theflow.base.unset_ object at 0x102dce230>, file_ids=['bac8649f-72af-44e6-b4c6-91f218d6d6a9'], userid=<theflow.base.unset object at 0x102dce230>)] searching in doc_ids [] INFO:ktem.index.file.pipelines:Skip retrieval because of no selected files: DocumentRetrievalPipeline( (vector_retrieval): <function Function._prepare_child..exec at 0x331c1dfc0>
(embedding): <function Function._prepare_child..exec at 0x331c1df30>
)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
GraphRAG embedding dim 3072
INFO:nano-graphrag:Load KV full_docs with 0 data
INFO:nano-graphrag:Load KV text_chunks with 0 data
INFO:nano-graphrag:Load KV llm_response_cache with 0 data
INFO:nano-graphrag:Load KV community_reports with 0 data
INFO:nano-graphrag:Loaded graph from /Users/andi/kotaemon/ktem_app_data/user_data/files/nano_graphrag/d897887f-bb79-42f5-aabd-d398b9a7f669/input/graph_chunk_entity_relation.graphml with 290 nodes, 188 edges
INFO:nano-vectordb:Load (276, 3072) data
INFO:nano-vectordb:Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': '/Users/andi/kotaemon/ktem_app_data/user_data/files/nano_graphrag/d897887f-bb79-42f5-aabd-d398b9a7f669/input/vdb_entities.json'} 276 data
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Traceback (most recent call last):
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.anext()
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 656, in anext
return await anyio.to_thread.run_sync(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, args)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/Users/andi/kotaemon/libs/ktem/ktem/pages/chat/init.py", line 899, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "/Users/andi/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 705, in stream
docs, infos = self.retrieve(message, history)
File "/Users/andi/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 503, in retrieve
retriever_docs = retriever_node(text=query)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/base.py", line 1097, in call
raise e from None
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/base.py", line 1088, in call
output = self.fl.exec(func, args, kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(args, kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call
raise e from None
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call
_output = self.next_call(*args, *kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call
return self.next_call(args, kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/Users/andi/kotaemon/libs/ktem/ktem/index/file/graph/nano_pipelines.py", line 355, in run
entities, relationships, reports, sources = asyncio.run(
File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/Users/andi/kotaemon/libs/ktem/ktem/index/file/graph/nano_pipelines.py", line 142, in nano_graph_rag_build_local_query_context
use_communities = await _find_most_related_community_from_entities(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/nano_graphrag/_op.py", line 698, in _find_most_related_community_from_entities
related_community_keys = sorted(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/nano_graphrag/_op.py", line 702, in
related_community_datas[k]["report_json"].get("rating", -1),
KeyError: '7'
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
User-id: 1, can see public conversations: True
the main issue here is the KeyError: '7', which is being raised during the execution of the _find_most_related_community_from_entities function in the nano_graphrag module. This suggests that the code is trying to access a key (‘7’) in the related_community_datas dictionary that does not exist.
Here’s what could be contributing to this problem:
How to Address This Issue:
Reproduction steps
Screenshots
Logs
No response
Browsers
No response
OS
No response
Additional information
No response