Closed maxruby closed 3 days ago
FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to gpt-4o.mini
in the current codebase (as its used for tokenization).
FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to
gpt-4o.mini
in the current codebase (as its used for tokenization).
use transformer Autotokenizer, tiktoken only support openai models
Setup
- Ubuntu 22.04
- 2x NVIDIA RTX A5000 GPU (48 GB VRAM)
Description I am encountering an issue with LightRAG where entity extraction consistently fails when using ollama models. Even though the system successfully processes chunks from a document, no entities or relationships are extracted, and the resulting graph contains 0 nodes and 0 edges. I have tried both
llama3.1:70b
andllama3.2:3b
served via ollama.Steps to Reproduce: Used the following code to initialize and run LightRAG:
import os from lightrag import LightRAG, QueryParam from lightrag.llm import ollama_model_complete, ollama_embedding from lightrag.utils import EmbeddingFunc WORKING_DIR = "./dickens" if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR) rag = LightRAG( working_dir=WORKING_DIR, llm_model_func=ollama_model_complete, llm_model_name='llama3.2:3b', embedding_func=EmbeddingFunc( embedding_dim=768, max_token_size=8192, func=lambda texts: ollama_embedding( texts, embed_model="nomic-embed-text:latest" ) ), ) with open("./book.txt") as f: rag.insert(f.read()) # Perform naive search print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))
Observed the following logs:
Logger initialized and directory created. 42 chunks processed successfully. Entity extraction failed with no entities or relationships found. Full Logs: plaintext Copy code 2024-10-16 22:57:21,241 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-16 22:57:21,241 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x716396d12160>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7163357649a0>, llm_model_name = llama3.2:3b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x716335762020> 2024-10-16 22:57:21,241 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-16 22:57:21,242 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-16 22:57:21,242 - lightrag - INFO - Load KV llm_response_cache with 0 data 2024-10-16 22:57:21,243 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-16 22:57:21,243 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-16 22:57:21,645 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-16 22:57:21,645 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-16 22:57:26,800 - lightrag - INFO - [Entity Extraction]... 2024-10-16 23:02:18,328 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-16 23:02:18,328 - lightrag - WARNING - No new entities and relationships found 2024-10-16 23:02:18,337 - lightrag - INFO - Writing graph with 0 nodes, 0 edges
Expected Behavior: Entities and relationships should be extracted from the processed chunks, and the resulting graph should contain nodes and edges representing them.
Observed Behavior: No entities or relationships are extracted. The following warnings appear in the logs:
WARNING - Didn't extract any entities, maybe your LLM is not working WARNING - No new entities and relationships found The final graph contains 0 nodes and 0 edges.
Additional Information: LLM Model: llama3.2:3b was used, but entity extraction consistently fails with
llama3.1:70b
as well. Working Directory: Set to ./dickens.Question:
How does hardcoding the
tiktoken_model_name
togpt-4o-mini
inlightrag.py
supposed to work with other non-OpenAI models?@dataclass class LightRAG: working_dir: str = field( default_factory=lambda: f"./lightrag_cache_{datetime.now().strftime('%Y-%m-%d-%H:%M:%S')}" ) # text chunking chunk_token_size: int = 1200 chunk_overlap_token_size: int = 100 tiktoken_model_name: str = "gpt-4o-mini"
Aftet attempting to exchange
gpt-4o-mini
withllama3.2:3b
and running the demo script, I get an error log which is summarized by GPT-4o as follows:The error you are encountering happens because the model name llama3.2:3b is not automatically recognized by the tiktoken library, which is responsible for handling the tokenization process. The error message suggests that tiktoken cannot map llama3.2:3b to an appropriate tokenizer.
Attempted Fixes: Verified that the document chunks are processed, but no entities are extracted. Please let me know if any further details or debugging information are needed. Thank you for your assistance.
I'm sorry, there are some bugs in the Ollama part. I'll work on fixing them as soon as possible.
FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to
gpt-4o.mini
in the current codebase (as its used for tokenization).
You can try using Hugging Face models as follows:
from lightrag.llm import hf_model_complete, hf_embedding
from transformers import AutoModel, AutoTokenizer
# Initialize LightRAG with Hugging Face model
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=hf_model_complete, # Use Hugging Face model for text generation
llm_model_name='meta-llama/Llama-3.1-8B-Instruct', # Model name from Hugging Face
# Use Hugging Face embedding function
embedding_func=EmbeddingFunc(
embedding_dim=384,
max_token_size=5000,
func=lambda texts: hf_embedding(
texts,
tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"),
embed_model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
)
),
)
You can find this demo in the examples
directory.
Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3
and it did not work :(
I do not want to use meta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.
In addition, I still do not understand how the current class LightRAG
should work without having API access tp gpt-4o-mini
.
It would be nice to understand this.
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.
In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.
FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to
gpt-4o.mini
in the current codebase (as its used for tokenization).use transformer Autotokenizer, tiktoken only support openai models
How?
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.
To make it easier for smaller models to handle, I reduced the number of examples in the prompt and decreased the chunk size. I hope this helps you succeed.
@maxruby I just retested it using Llama 3.1 8b, and it is now running smoothly.
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.
Do you have any changes? Can it run successfully just by configuring the code?
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.
Do you have any changes? Can it run successfully just by configuring the code?
Fixes were made yesterday, so at least you need to pull the latest changes from main
.
I am still evaluating whether and how well it actually works.
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.
Do you have any changes? Can it run successfully just by configuring the code?
Fixes were made yesterday, so at least you need to pull the latest changes from
main
. I am still evaluating whether and how well it actually works.
Can you take a look at my error logs?
2024-10-17 21:26:05,430 - lightrag - INFO - Logger initialized for working directory: ./dickens
2024-10-17 21:26:05,430 - lightrag - DEBUG - LightRAG init with param:
working_dir = ./dickens,
chunk_token_size = 1200,
chunk_overlap_token_size = 100,
tiktoken_model_name = gpt-4o-mini,
entity_extract_max_gleaning = 1,
entity_summary_to_max_tokens = 500,
node_embedding_algorithm = node2vec,
node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function
2024-10-17 21:26:05,431 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 21:26:05,434 - lightrag - INFO - Load KV llm_response_cache with 85 data 2024-10-17 21:26:05,435 - lightrag - INFO - Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges 2024-10-17 21:26:05,437 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 21:26:05,437 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 21:26:05,813 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 21:26:05,813 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 21:26:10,985 - lightrag - INFO - [Entity Extraction]... 2024-10-17 21:26:12,758 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 21:26:12,758 - lightrag - WARNING - No new entities and relationships found 2024-10-17 21:26:12,764 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 21:26:12,765 - lightrag - INFO - Creating a new event loop in a sub-thread.
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.
Do you have any changes? Can it run successfully just by configuring the code?
Fixes were made yesterday, so at least you need to pull the latest changes from
main
. I am still evaluating whether and how well it actually works.Can you take a look at my error logs? 2024-10-17 21:26:05,430 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-17 21:26:05,430 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function at 0x7f1408993d90>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7f12b1db1f30>, llm_model_name = qwen2.5:7b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x7f12b1d9fac0>
2024-10-17 21:26:05,431 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 21:26:05,434 - lightrag - INFO - Load KV llm_response_cache with 85 data 2024-10-17 21:26:05,435 - lightrag - INFO - Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges 2024-10-17 21:26:05,437 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 21:26:05,437 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 21:26:05,813 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 21:26:05,813 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 21:26:10,985 - lightrag - INFO - [Entity Extraction]... 2024-10-17 21:26:12,758 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 21:26:12,758 - lightrag - WARNING - No new entities and relationships found 2024-10-17 21:26:12,764 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 21:26:12,765 - lightrag - INFO - Creating a new event loop in a sub-thread.
I am not the developer here, but dare to comment that your error is very much what I experienced yesterday and reported in my issue here with llama
and mistral
models.
Basically;
"Didn't extract any entities, maybe your LLM is not working" "No new entities and relationships found"
Sorry, I tried this too but with
mistralai/Mistral-7B-Instruct-v0.3
and it did not work :( I do not want to usemeta-llama/Llama-3.1-8B-Instruct
due to licensing conditions.In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.
Do you have any changes? Can it run successfully just by configuring the code?
Fixes were made yesterday, so at least you need to pull the latest changes from
main
. I am still evaluating whether and how well it actually works.Can you take a look at my error logs? 2024-10-17 21:26:05,430 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-17 21:26:05,430 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function at 0x7f1408993d90>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7f12b1db1f30>, llm_model_name = qwen2.5:7b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x7f12b1d9fac0> 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 21:26:05,434 - lightrag - INFO - Load KV llm_response_cache with 85 data 2024-10-17 21:26:05,435 - lightrag - INFO - Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges 2024-10-17 21:26:05,437 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 21:26:05,437 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 21:26:05,813 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 21:26:05,813 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 21:26:10,985 - lightrag - INFO - [Entity Extraction]... 2024-10-17 21:26:12,758 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 21:26:12,758 - lightrag - WARNING - No new entities and relationships found 2024-10-17 21:26:12,764 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 21:26:12,765 - lightrag - INFO - Creating a new event loop in a sub-thread.
I am not the developer here, but dare to comment that your error is very much what I experienced yesterday and reported in my issue here with
llama
andmistral
models.Basically;
- Vectors Insertion: 42 vectors corresponding to these chunks are being inserted
- Error:
"Didn't extract any entities, maybe your LLM is not working" "No new entities and relationships found"
got it, thank you, so how do you solve it?
@maxruby I just retested it using Llama 3.1 8b, and it is now running smoothly.
@LarFii
I would love to know what exactly I am doing different from you to NOT get it working with the same code you have in main
.
I repeated exactly what you mentioned in your last message and ran the LightRAG/examples/lightrag_ollama_demo.py
with Llama 3.1 8b
with the same negative results I reported in this issue (i.e., no Entities and Relations found):
2024-10-17 23:53:02,216 - lightrag - INFO - Logger initialized for working directory: ./dickens
2024-10-17 23:53:02,216 - lightrag - DEBUG - LightRAG init with param:
working_dir = ./dickens,
chunk_token_size = 1200,
chunk_overlap_token_size = 100,
tiktoken_model_name = gpt-4o-mini,
entity_extract_max_gleaning = 1,
entity_summary_to_max_tokens = 500,
node_embedding_algorithm = node2vec,
node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x7f3b35a4e160>},
embedding_batch_num = 32,
embedding_func_max_async = 16,
llm_model_func = <function ollama_model_complete at 0x7f3ad43880e0>,
llm_model_name = llama3.1:8b,
llm_model_max_token_size = 32768,
llm_model_max_async = 16,
key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
vector_db_storage_cls_kwargs = {},
graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
enable_llm_cache = True,
addon_params = {},
convert_response_to_json_func = <function convert_response_to_json at 0x7f3ad437d8a0>
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV full_docs with 0 data
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV text_chunks with 0 data
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV llm_response_cache with 0 data
2024-10-17 23:53:02,217 - lightrag - INFO - Creating a new event loop in a sub-thread.
2024-10-17 23:53:02,218 - lightrag - INFO - [New Docs] inserting 1 docs
2024-10-17 23:53:02,622 - lightrag - INFO - [New Chunks] inserting 42 chunks
2024-10-17 23:53:02,622 - lightrag - INFO - Inserting 42 vectors to chunks
2024-10-17 23:53:07,657 - lightrag - INFO - [Entity Extraction]...
2024-10-17 23:55:49,563 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working
2024-10-17 23:55:49,563 - lightrag - WARNING - No new entities and relationships found
2024-10-17 23:55:49,571 - lightrag - INFO - Writing graph with 0 nodes, 0 edges
2024-10-17 23:55:49,601 - lightrag - INFO - Creating a new event loop in a sub-thread.
got it, thank you, so how do you solve it?
@Christ-dev See my last comment to @LarFii - no luck on my side either.
Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
/set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this: "num_ctx": 32768
works, but you can try.
See https://github.com/ollama/ollama/blob/main/docs/faq.md
For me, after today's update, with Qwen 2.5 7B, naive and global search works, but local search returned:
Sorry, I'm not able to provide an answer to that question.
Hybrid search has warning: "Low Level context is None. Return empty Low entity/relationship/source"
@maxruby I just retested it using Llama 3.1 8b, and it is now running smoothly.
@LarFii
I would love to know what exactly I am doing different from you to NOT get it working with the same code you have in
main
. I repeated exactly what you mentioned in your last message and ran theLightRAG/examples/lightrag_ollama_demo.py
withLlama 3.1 8b
with the same negative results I reported in this issue (i.e., no Entities and Relations found):2024-10-17 23:53:02,216 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-17 23:53:02,216 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x7f3b35a4e160>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7f3ad43880e0>, llm_model_name = llama3.1:8b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x7f3ad437d8a0> 2024-10-17 23:53:02,216 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 23:53:02,216 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 23:53:02,216 - lightrag - INFO - Load KV llm_response_cache with 0 data 2024-10-17 23:53:02,217 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 23:53:02,218 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 23:53:02,622 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 23:53:02,622 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 23:53:07,657 - lightrag - INFO - [Entity Extraction]... 2024-10-17 23:55:49,563 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 23:55:49,563 - lightrag - WARNING - No new entities and relationships found 2024-10-17 23:55:49,571 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 23:55:49,601 - lightrag - INFO - Creating a new event loop in a sub-thread.
Based on the logs, it seems that the previous cache content wasn't cleared, which resulted in the LLM extraction not being triggered again.
Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
/set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this:
"num_ctx": 32768
works, but you can try.
Thank you so much for your suggestion! Your solution worked perfectly.
To share with others how to modify the context length, here’s a step-by-step guide for using Ollama to increase the num_ctx
parameter.
Pull the model:
ollama pull qwen2
Display the model file:
ollama show --modelfile qwen2 > Modelfile
Edit the Modelfile
by adding the following line:
PARAMETER num_ctx 32768
Create the modified model:
ollama create -f Modelfile qwen2m
This process is not limited to Qwen 2; it's just an example. You can apply similar steps to other models in Ollama.
I increased the value of the num_ctx parameter, but this time the model did not fit on my 12 GB graphics card. It's using 15% CPU, 85% GPU and naturally this is very slow. I think local language models are not very suitable for this job. If I'm wrong, please show me the right way
@HeQinWill l increased the value of the num_ctx parameter for qwen2.5:7b and it worked fine on my 12 GB card (RTX 3060). It used about 9GB at run-time. Thanks !!!
Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
/set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this:
"num_ctx": 32768
works, but you can try.Thank you so much for your suggestion! Your solution worked perfectly.
To share with others how to modify the context length, here’s a step-by-step guide for using Ollama to increase the
num_ctx
parameter.
- Pull the model:
ollama pull qwen2
- Display the model file:
ollama show --modelfile qwen2 > Modelfile
- Edit the
Modelfile
by adding the following line:PARAMETER num_ctx 32768
- Create the modified model:
ollama create -f Modelfile qwen2m
This process is not limited to Qwen 2; it's just an example. You can apply similar steps to other models in Ollama.
I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each.
Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo.
I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each.
Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo.
Ah, great find! The num_ctx
value does need to be adjusted based on individual hardware and local LLM model configurations. I also found that qwen2 and qwen2.5 with 7B models on V100 16GB hardware perform fine, whereas llama3.2 (3B) encounters issues.
BTW, the value of 32768
was simply inherited from the default parameter in LightRAG. https://github.com/HKUDS/LightRAG/blob/e2db7b6c45ac4b48d7026d69b3a770b42bad4dbe/lightrag/lightrag.py#L87
Only a quick update here (will comment more extensively later). I could finally get the ollama example running with qwen2 after setting the num_ctx parameter to 32768. GPU utilization is over 94% (24 GB VRAM of an A5000) and when looking at the graphml output, it's not entirely clear how to judge the quality of the graphs.
On Sat, Oct 19, 2024, 9:02 AM He Qin @.***> wrote:
I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each.
Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo.
Ah, great find! The num_ctx value does need to be adjusted based on individual hardware and local LLM model configurations. I also found that qwen2 and qwen2.5 with 7B models on V100 16GB hardware perform fine, whereas llama3.2 (3B) encounters issues.
BTW, the value of 32768 was simply inherited from the default parameter in LightRAG. https://github.com/HKUDS/LightRAG/blob/e2db7b6c45ac4b48d7026d69b3a770b42bad4dbe/lightrag/lightrag.py#L87
— Reply to this email directly, view it on GitHub https://github.com/HKUDS/LightRAG/issues/30#issuecomment-2423626382, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJ77TICBDHDI2T5LET7FD3Z4H7ZXAVCNFSM6AAAAABQCLQOAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGYZDMMZYGI . You are receiving this because you were mentioned.Message ID: @.***>
@HeQinWill l increased the value of the num_ctx parameter for qwen2.5:7b and it worked fine on my 12 GB card (RTX 3060). It used about 9GB at run-time. Thanks !!!
UPDATE: I just tried with the model llama3.1-8b and, after setting the num_ctx parameter to 32768, it worked fine. It used 11GB of VRAM.
Only a quick update here (will comment more extensively later). I could finally get the ollama example running with qwen2 after setting the num_ctx parameter to 32768. GPU utilization is over 94% (24 GB VRAM of an A5000) and when looking at the graphml output, it's not entirely clear how to judge the quality of the graphs. … On Sat, Oct 19, 2024, 9:02 AM He Qin @.> wrote: I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each. Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo. Ah, great find! The num_ctx value does need to be adjusted based on individual hardware and local LLM model configurations. I also found that qwen2 and qwen2.5 with 7B models on V100 16GB hardware perform fine, whereas llama3.2 (3B) encounters issues. BTW, the value of 32768 was simply inherited from the default parameter in LightRAG. https://github.com/HKUDS/LightRAG/blob/e2db7b6c45ac4b48d7026d69b3a770b42bad4dbe/lightrag/lightrag.py#L87 — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJ77TICBDHDI2T5LET7FD3Z4H7ZXAVCNFSM6AAAAABQCLQOAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGYZDMMZYGI . You are receiving this because you were mentioned.Message ID: @.>
To do so visually, use a free tool called "yEd" and open the .graphml file that's created in your working directory.
the same error!
I tried qwen2.5:3b-instruct-max-context on my 4080super and it took up 13G of video memory not long after generation started, now I'm limiting num_ctx to 10240
I tried qwen2.5:3b-instruct-max-context on my 4080super and it took up 13G of video memory not long after generation started, now I'm limiting num_ctx to 10240
Can you run 'ollama ps' to see that you just don't have other models loaded as well?
I see the same issue today. It would be helpful if you can mention what all changes need to be made in order to work with Ollama models.
I tried qwen2.5:3b-instruct-max-context on my 4080super and it took up 13G of video memory not long after generation started, now I'm limiting num_ctx to 10240
Can you run 'ollama ps' to see that you just don't have other models loaded as well?
Yes, I checked and it just takes up a huge amount of video memory, maybe because I have so much training material. Also I found an issue where LightRAG would fail to get answers from ollama after a while when num_ntx was set to 10240, so I changed it back to the max value
Copy the qwen2.5 settings
ollama pull qwen2.5
ollama show --modelfile qwen2.5 > qwen_settings.txt
Add the num_ctx parameter value above the LICENSE
sed '/LICENSE """/i\
PARAMETER num_ctx 8192\
' qwen_settings.txt > your-model-name-settings.txt
Create a new model, curl the corpus, remove old working directory, run the ollama demo
ollama create -f your-model-name-settings.txt your_model_name
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
rm -rfd dickens/
python examples/lightrag_ollama_demo.py
I see the same issue today. It would be helpful if you can mention what all changes need to be made in order to work with Ollama models.
@HeQinWill
I really appreciate how quickly the team responds to the issue and the comments. Having said that, if I may suggest something here (for the benefit of the project), I think this could be a little more structured and perhaps provide more clear guidelines for the expectations on GPU VRAM consumption and debugging process before half a dozen people start throwing their GPU and time at it. Of course, this is experimental and there are not very realistic alternatives for on-premise local Graph RAG with ollama models, but perhaps it could be organized as a new "feature" Task or Epic/Story as part of the project. Then whoever is interested to contribute could join the Epic/Story or implement the task. My 2 cents.
As for the consumption of GPU VRAM on my server (which actually has 2 x RTX A5000 with 24 GB each), you can see below with nvtop
how the GPU consumption peaks to over 90%. In case you wonder, YES I only had the qwen2
model loaded by ollama at that time (ollama ps
).
@44cort44 Thanks for your quick reply and practical suggestion. I am well familiar with the "yEd" tool and that was not my point in my very brief comment :)
I actually wrote my own python script (see below, actually not very difficult to do with networkx
and matplotlib
) to visualize the graph and attempt to measure the actual effectiveness of the lightRAG results. I understand that there are multiple approaches to do this and my attempt is simply to understand visually how well the graphs model the entities and relations in the input document. The preliminary results do not easily explain how well the graphs have been built or reflect the original data, that is what I meant to say.
python script in case you are interested:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
# Load the GraphML file
file_path = "../dickens/graph_chunk_entity_relation.graphml"
G = nx.read_graphml(file_path)
# Draw the graph
fig, ax = plt.subplots(figsize=(16, 16))
pos = nx.spring_layout(G, seed=42) # Positions for all nodes
# Calculate closeness centrality to determine node colors and sizes
closeness = nx.closeness_centrality(G)
norm = plt.Normalize(vmin=min(closeness.values()), vmax=max(closeness.values()))
colors = [plt.cm.Reds(norm(closeness[node])) for node in G.nodes()]
sizes = [2000 * closeness[node] for node in G.nodes()]
# Draw nodes, edges, and labels
nodes = nx.draw_networkx_nodes(G, pos, node_size=sizes, node_color=colors, alpha=0.8, linewidths=0.5, edgecolors='black', ax=ax)
nx.draw_networkx_edges(G, pos, width=0.8, alpha=0.5, edge_color='gray', ax=ax)
nx.draw_networkx_labels(G, pos, font_size=8, font_family='sans-serif', verticalalignment='bottom', horizontalalignment='center', ax=ax)
# Add color bar
sm = plt.cm.ScalarMappable(cmap='Reds', norm=norm)
sm.set_array([])
fig.colorbar(sm, ax=ax, label='Closeness Centrality')
# Display the plot
plt.title("GraphML Visualization")
plt.axis('off')
plt.tight_layout()
plt.show()
@maxruby
Just to clarify, I’m not part of the team, just a user who’s faced similar issues and wanted to share my experience. As for my previous response, I’m not sure if it fully addresses your concern.
I believe that with the efforts of the development team and the community, we’ll continue to make this more efficient and accessible.
Thank you all for your contributions. We will continue working hard to make LightRAG better : )
@LarFii Thank you for the work on this project! Do you know if there is a plan for what new features will be developed and how they might be prioritized in lightRAG ? For example, it would be nice to know what the team already has in mind in or is working on for further development with local Embedding Models. Could this be followed up in a Discussion?
I created a detailed video tutorial on how to get LightRAG
working with Ollama
based on the tips shared here. Here is the link if anyone is running into issues:
Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
/set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this:
"num_ctx": 32768
works, but you can try. See https://github.com/ollama/ollama/blob/main/docs/faq.mdFor me, after today's update, with Qwen 2.5 7B, naive and global search works, but local search returned:
Sorry, I'm not able to provide an answer to that question.
Hybrid search has warning:
"Low Level context is None. Return empty Low entity/relationship/source"
Same issue(I failed in local&global&hybrid search...) after running lightrag_ollama_demo.py, any idea how to solve this? Thank you!
For local search I got:
Sorry, I'm not able to provide an answer to that question.
For global search I got:
I'm sorry, but I need you to provide the story or the text first so that I can analyze it and determine the top themes. Once you share the content, I'll be able to identify the main themes for you.
For hybrid search I got:
To provide an accurate response, I need the text of the story you're referring to. Please share the story or key parts of it, and I will help identify the main themes.
The graph construction step seemed to be successful.
Hi @maxruby, have you been succesful to run llama3.2:3b? I'm still scratching my head about config to fine tune. Thank you!
Hi @maxruby, have you been succesful to run llama3.2:3b? I'm still scratching my head about config to fine tune. Thank you!
I did not try again, but I suppose if you adjust the parameter num_ct
to 32768 it should probably work as it does for qwen2
.
Setup
Description I am encountering an issue with LightRAG where entity extraction consistently fails when using ollama models. Even though the system successfully processes chunks from a document, no entities or relationships are extracted, and the resulting graph contains 0 nodes and 0 edges. I have tried both
llama3.1:70b
andllama3.2:3b
served via ollama.Steps to Reproduce: Used the following code to initialize and run LightRAG:
Observed the following logs:
Expected Behavior: Entities and relationships should be extracted from the processed chunks, and the resulting graph should contain nodes and edges representing them.
Observed Behavior: No entities or relationships are extracted. The following warnings appear in the logs:
WARNING - Didn't extract any entities, maybe your LLM is not working WARNING - No new entities and relationships found The final graph contains 0 nodes and 0 edges.
Additional Information: LLM Model: llama3.2:3b was used, but entity extraction consistently fails with
llama3.1:70b
as well. Working Directory: Set to ./dickens.Question:
How does hardcoding the
tiktoken_model_name
togpt-4o-mini
inlightrag.py
supposed to work with other non-OpenAI models?Aftet attempting to exchange
gpt-4o-mini
withllama3.2:3b
and running the demo script, I get an error log which is summarized by GPT-4o as follows:Attempted Fixes: Verified that the document chunks are processed, but no entities are extracted. Please let me know if any further details or debugging information are needed. Thank you for your assistance.