Entity Extraction Failure: No Entities or Relationships Extracted with ollama models

maxruby commented 1 week ago

Setup

Ubuntu 22.04
2x NVIDIA RTX A5000 GPU (48 GB VRAM)

Description I am encountering an issue with LightRAG where entity extraction consistently fails when using ollama models. Even though the system successfully processes chunks from a document, no entities or relationships are extracted, and the resulting graph contains 0 nodes and 0 edges. I have tried both llama3.1:70b and llama3.2:3b served via ollama.

Steps to Reproduce: Used the following code to initialize and run LightRAG:

import os
from lightrag import LightRAG, QueryParam
from lightrag.llm import ollama_model_complete, ollama_embedding
from lightrag.utils import EmbeddingFunc

WORKING_DIR = "./dickens"

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=ollama_model_complete,  
    llm_model_name='llama3.2:3b',
    embedding_func=EmbeddingFunc(
        embedding_dim=768,
        max_token_size=8192,
        func=lambda texts: ollama_embedding(
            texts, 
            embed_model="nomic-embed-text:latest"
        )
    ),
)

with open("./book.txt") as f:
    rag.insert(f.read())

# Perform naive search
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))

Observed the following logs:

Logger initialized and directory created.
42 chunks processed successfully.
Entity extraction failed with no entities or relationships found.
Full Logs:
plaintext
Copy code
2024-10-16 22:57:21,241 - lightrag - INFO - Logger initialized for working directory: ./dickens
2024-10-16 22:57:21,241 - lightrag - DEBUG - LightRAG init with param:
  working_dir = ./dickens,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o-mini,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x716396d12160>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  llm_model_func = <function ollama_model_complete at 0x7163357649a0>,
  llm_model_name = llama3.2:3b,
  llm_model_max_token_size = 32768,
  llm_model_max_async = 16,
  key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
  vector_db_storage_cls_kwargs = {},
  graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x716335762020>

2024-10-16 22:57:21,241 - lightrag - INFO - Load KV full_docs with 0 data
2024-10-16 22:57:21,242 - lightrag - INFO - Load KV text_chunks with 0 data
2024-10-16 22:57:21,242 - lightrag - INFO - Load KV llm_response_cache with 0 data
2024-10-16 22:57:21,243 - lightrag - INFO - Creating a new event loop in a sub-thread.
2024-10-16 22:57:21,243 - lightrag - INFO - [New Docs] inserting 1 docs
2024-10-16 22:57:21,645 - lightrag - INFO - [New Chunks] inserting 42 chunks
2024-10-16 22:57:21,645 - lightrag - INFO - Inserting 42 vectors to chunks
2024-10-16 22:57:26,800 - lightrag - INFO - [Entity Extraction]...
2024-10-16 23:02:18,328 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working
2024-10-16 23:02:18,328 - lightrag - WARNING - No new entities and relationships found
2024-10-16 23:02:18,337 - lightrag - INFO - Writing graph with 0 nodes, 0 edges

Expected Behavior: Entities and relationships should be extracted from the processed chunks, and the resulting graph should contain nodes and edges representing them.

Observed Behavior: No entities or relationships are extracted. The following warnings appear in the logs:

WARNING - Didn't extract any entities, maybe your LLM is not working WARNING - No new entities and relationships found The final graph contains 0 nodes and 0 edges.

Additional Information: LLM Model: llama3.2:3b was used, but entity extraction consistently fails with llama3.1:70b as well. Working Directory: Set to ./dickens.

Question:

How does hardcoding the tiktoken_model_name to gpt-4o-mini in lightrag.py supposed to work with other non-OpenAI models?

@dataclass
class LightRAG:
    working_dir: str = field(
        default_factory=lambda: f"./lightrag_cache_{datetime.now().strftime('%Y-%m-%d-%H:%M:%S')}"
    )

    # text chunking
    chunk_token_size: int = 1200
    chunk_overlap_token_size: int = 100
    tiktoken_model_name: str = "gpt-4o-mini"

Aftet attempting to exchange gpt-4o-mini with llama3.2:3b and running the demo script, I get an error log which is summarized by GPT-4o as follows:

The error you are encountering happens because the model name llama3.2:3b is not automatically recognized by the tiktoken library, which is responsible for handling the tokenization process. The error message suggests that tiktoken cannot map llama3.2:3b to an appropriate tokenizer.

Attempted Fixes: Verified that the document chunks are processed, but no entities are extracted. Please let me know if any further details or debugging information are needed. Thank you for your assistance.

maxruby commented 6 days ago

FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to gpt-4o.mini in the current codebase (as its used for tokenization).

hznnnnnn commented 6 days ago

FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to gpt-4o.mini in the current codebase (as its used for tokenization).

use transformer Autotokenizer， tiktoken only support openai models

LarFii commented 6 days ago

Setup

Ubuntu 22.04
2x NVIDIA RTX A5000 GPU (48 GB VRAM)

Description I am encountering an issue with LightRAG where entity extraction consistently fails when using ollama models. Even though the system successfully processes chunks from a document, no entities or relationships are extracted, and the resulting graph contains 0 nodes and 0 edges. I have tried both llama3.1:70b and llama3.2:3b served via ollama.

Steps to Reproduce: Used the following code to initialize and run LightRAG:

import os
from lightrag import LightRAG, QueryParam
from lightrag.llm import ollama_model_complete, ollama_embedding
from lightrag.utils import EmbeddingFunc

WORKING_DIR = "./dickens"

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=ollama_model_complete,  
    llm_model_name='llama3.2:3b',
    embedding_func=EmbeddingFunc(
        embedding_dim=768,
        max_token_size=8192,
        func=lambda texts: ollama_embedding(
            texts, 
            embed_model="nomic-embed-text:latest"
        )
    ),
)

with open("./book.txt") as f:
    rag.insert(f.read())

# Perform naive search
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))

Observed the following logs:

Logger initialized and directory created.
42 chunks processed successfully.
Entity extraction failed with no entities or relationships found.
Full Logs:
plaintext
Copy code
2024-10-16 22:57:21,241 - lightrag - INFO - Logger initialized for working directory: ./dickens
2024-10-16 22:57:21,241 - lightrag - DEBUG - LightRAG init with param:
  working_dir = ./dickens,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o-mini,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x716396d12160>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  llm_model_func = <function ollama_model_complete at 0x7163357649a0>,
  llm_model_name = llama3.2:3b,
  llm_model_max_token_size = 32768,
  llm_model_max_async = 16,
  key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
  vector_db_storage_cls_kwargs = {},
  graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x716335762020>

2024-10-16 22:57:21,241 - lightrag - INFO - Load KV full_docs with 0 data
2024-10-16 22:57:21,242 - lightrag - INFO - Load KV text_chunks with 0 data
2024-10-16 22:57:21,242 - lightrag - INFO - Load KV llm_response_cache with 0 data
2024-10-16 22:57:21,243 - lightrag - INFO - Creating a new event loop in a sub-thread.
2024-10-16 22:57:21,243 - lightrag - INFO - [New Docs] inserting 1 docs
2024-10-16 22:57:21,645 - lightrag - INFO - [New Chunks] inserting 42 chunks
2024-10-16 22:57:21,645 - lightrag - INFO - Inserting 42 vectors to chunks
2024-10-16 22:57:26,800 - lightrag - INFO - [Entity Extraction]...
2024-10-16 23:02:18,328 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working
2024-10-16 23:02:18,328 - lightrag - WARNING - No new entities and relationships found
2024-10-16 23:02:18,337 - lightrag - INFO - Writing graph with 0 nodes, 0 edges

Expected Behavior: Entities and relationships should be extracted from the processed chunks, and the resulting graph should contain nodes and edges representing them.

Observed Behavior: No entities or relationships are extracted. The following warnings appear in the logs:

WARNING - Didn't extract any entities, maybe your LLM is not working WARNING - No new entities and relationships found The final graph contains 0 nodes and 0 edges.

Additional Information: LLM Model: llama3.2:3b was used, but entity extraction consistently fails with llama3.1:70b as well. Working Directory: Set to ./dickens.

Question:

How does hardcoding the tiktoken_model_name to gpt-4o-mini in lightrag.py supposed to work with other non-OpenAI models?

@dataclass
class LightRAG:
    working_dir: str = field(
        default_factory=lambda: f"./lightrag_cache_{datetime.now().strftime('%Y-%m-%d-%H:%M:%S')}"
    )

    # text chunking
    chunk_token_size: int = 1200
    chunk_overlap_token_size: int = 100
    tiktoken_model_name: str = "gpt-4o-mini"

Aftet attempting to exchange gpt-4o-mini with llama3.2:3b and running the demo script, I get an error log which is summarized by GPT-4o as follows:

The error you are encountering happens because the model name llama3.2:3b is not automatically recognized by the tiktoken library, which is responsible for handling the tokenization process. The error message suggests that tiktoken cannot map llama3.2:3b to an appropriate tokenizer.

Attempted Fixes: Verified that the document chunks are processed, but no entities are extracted. Please let me know if any further details or debugging information are needed. Thank you for your assistance.

I'm sorry, there are some bugs in the Ollama part. I'll work on fixing them as soon as possible.

LarFii commented 6 days ago

FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to gpt-4o.mini in the current codebase (as its used for tokenization).

You can try using Hugging Face models as follows:

from lightrag.llm import hf_model_complete, hf_embedding
from transformers import AutoModel, AutoTokenizer

# Initialize LightRAG with Hugging Face model
rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=hf_model_complete,  # Use Hugging Face model for text generation
    llm_model_name='meta-llama/Llama-3.1-8B-Instruct',  # Model name from Hugging Face
    # Use Hugging Face embedding function
    embedding_func=EmbeddingFunc(
        embedding_dim=384,
        max_token_size=5000,
        func=lambda texts: hf_embedding(
            texts, 
            tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"),
            embed_model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
        )
    ),
)

You can find this demo in the examples directory.

maxruby commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In addition, I still do not understand how the current class LightRAG should work without having API access tp gpt-4o-mini. It would be nice to understand this.

LarFii commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.

maxruby commented 6 days ago

FYI - tried a few other models and combinations. I still do not understand how its possible to work without access to gpt-4o.mini in the current codebase (as its used for tokenization).

use transformer Autotokenizer， tiktoken only support openai models

How?

LarFii commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

To make it easier for smaller models to handle, I reduced the number of examples in the prompt and decreased the chunk size. I hope this helps you succeed.

LarFii commented 6 days ago

@maxruby I just retested it using Llama 3.1 8b, and it is now running smoothly.

Christ-dev commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.

Do you have any changes? Can it run successfully just by configuring the code?

maxruby commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.

Do you have any changes? Can it run successfully just by configuring the code?

Fixes were made yesterday, so at least you need to pull the latest changes from main. I am still evaluating whether and how well it actually works.

Christ-dev commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.

Do you have any changes? Can it run successfully just by configuring the code?

Fixes were made yesterday, so at least you need to pull the latest changes from main. I am still evaluating whether and how well it actually works.

Can you take a look at my error logs？ 2024-10-17 21:26:05,430 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-17 21:26:05,430 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function at 0x7f1408993d90>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7f12b1db1f30>, llm_model_name = qwen2.5:7b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x7f12b1d9fac0>

2024-10-17 21:26:05,431 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 21:26:05,434 - lightrag - INFO - Load KV llm_response_cache with 85 data 2024-10-17 21:26:05,435 - lightrag - INFO - Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges 2024-10-17 21:26:05,437 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 21:26:05,437 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 21:26:05,813 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 21:26:05,813 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 21:26:10,985 - lightrag - INFO - [Entity Extraction]... 2024-10-17 21:26:12,758 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 21:26:12,758 - lightrag - WARNING - No new entities and relationships found 2024-10-17 21:26:12,764 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 21:26:12,765 - lightrag - INFO - Creating a new event loop in a sub-thread.

maxruby commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.

Do you have any changes? Can it run successfully just by configuring the code?

Fixes were made yesterday, so at least you need to pull the latest changes from main. I am still evaluating whether and how well it actually works.

Can you take a look at my error logs？ 2024-10-17 21:26:05,430 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-17 21:26:05,430 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function at 0x7f1408993d90>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7f12b1db1f30>, llm_model_name = qwen2.5:7b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x7f12b1d9fac0>

2024-10-17 21:26:05,431 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 21:26:05,434 - lightrag - INFO - Load KV llm_response_cache with 85 data 2024-10-17 21:26:05,435 - lightrag - INFO - Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges 2024-10-17 21:26:05,437 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 21:26:05,437 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 21:26:05,813 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 21:26:05,813 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 21:26:10,985 - lightrag - INFO - [Entity Extraction]... 2024-10-17 21:26:12,758 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 21:26:12,758 - lightrag - WARNING - No new entities and relationships found 2024-10-17 21:26:12,764 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 21:26:12,765 - lightrag - INFO - Creating a new event loop in a sub-thread.

I am not the developer here, but dare to comment that your error is very much what I experienced yesterday and reported in my issue here with llama and mistral models.

Basically;

Vectors Insertion: 42 vectors corresponding to these chunks are being inserted
Error:

"Didn't extract any entities, maybe your LLM is not working" "No new entities and relationships found"

Christ-dev commented 6 days ago

Sorry, I tried this too but with mistralai/Mistral-7B-Instruct-v0.3 and it did not work :( I do not want to use meta-llama/Llama-3.1-8B-Instruct due to licensing conditions.

In the latest code, I just tried Ollama and successfully ran it using Qwen-2.5 7b.

Do you have any changes? Can it run successfully just by configuring the code?

Fixes were made yesterday, so at least you need to pull the latest changes from main. I am still evaluating whether and how well it actually works.

Can you take a look at my error logs？ 2024-10-17 21:26:05,430 - lightrag - INFO - Logger initialized for working directory: ./dickens 2024-10-17 21:26:05,430 - lightrag - DEBUG - LightRAG init with param: working_dir = ./dickens, chunk_token_size = 1200, chunk_overlap_token_size = 100, tiktoken_model_name = gpt-4o-mini, entity_extract_max_gleaning = 1, entity_summary_to_max_tokens = 500, node_embedding_algorithm = node2vec, node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3}, embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function at 0x7f1408993d90>}, embedding_batch_num = 32, embedding_func_max_async = 16, llm_model_func = <function ollama_model_complete at 0x7f12b1db1f30>, llm_model_name = qwen2.5:7b, llm_model_max_token_size = 32768, llm_model_max_async = 16, key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>, vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>, vector_db_storage_cls_kwargs = {}, graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>, enable_llm_cache = True, addon_params = {}, convert_response_to_json_func = <function convert_response_to_json at 0x7f12b1d9fac0> 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV full_docs with 0 data 2024-10-17 21:26:05,431 - lightrag - INFO - Load KV text_chunks with 0 data 2024-10-17 21:26:05,434 - lightrag - INFO - Load KV llm_response_cache with 85 data 2024-10-17 21:26:05,435 - lightrag - INFO - Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges 2024-10-17 21:26:05,437 - lightrag - INFO - Creating a new event loop in a sub-thread. 2024-10-17 21:26:05,437 - lightrag - INFO - [New Docs] inserting 1 docs 2024-10-17 21:26:05,813 - lightrag - INFO - [New Chunks] inserting 42 chunks 2024-10-17 21:26:05,813 - lightrag - INFO - Inserting 42 vectors to chunks 2024-10-17 21:26:10,985 - lightrag - INFO - [Entity Extraction]... 2024-10-17 21:26:12,758 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working 2024-10-17 21:26:12,758 - lightrag - WARNING - No new entities and relationships found 2024-10-17 21:26:12,764 - lightrag - INFO - Writing graph with 0 nodes, 0 edges 2024-10-17 21:26:12,765 - lightrag - INFO - Creating a new event loop in a sub-thread.

I am not the developer here, but dare to comment that your error is very much what I experienced yesterday and reported in my issue here with llama and mistral models.

Basically;

Vectors Insertion: 42 vectors corresponding to these chunks are being inserted

Error:

"Didn't extract any entities, maybe your LLM is not working" "No new entities and relationships found"

got it, thank you, so how do you solve it?

maxruby commented 6 days ago

@maxruby I just retested it using Llama 3.1 8b, and it is now running smoothly.

@LarFii

I would love to know what exactly I am doing different from you to NOT get it working with the same code you have in main. I repeated exactly what you mentioned in your last message and ran the LightRAG/examples/lightrag_ollama_demo.py with Llama 3.1 8b with the same negative results I reported in this issue (i.e., no Entities and Relations found):

2024-10-17 23:53:02,216 - lightrag - INFO - Logger initialized for working directory: ./dickens
2024-10-17 23:53:02,216 - lightrag - DEBUG - LightRAG init with param:
  working_dir = ./dickens,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o-mini,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x7f3b35a4e160>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  llm_model_func = <function ollama_model_complete at 0x7f3ad43880e0>,
  llm_model_name = llama3.1:8b,
  llm_model_max_token_size = 32768,
  llm_model_max_async = 16,
  key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
  vector_db_storage_cls_kwargs = {},
  graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x7f3ad437d8a0>

2024-10-17 23:53:02,216 - lightrag - INFO - Load KV full_docs with 0 data
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV text_chunks with 0 data
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV llm_response_cache with 0 data
2024-10-17 23:53:02,217 - lightrag - INFO - Creating a new event loop in a sub-thread.
2024-10-17 23:53:02,218 - lightrag - INFO - [New Docs] inserting 1 docs
2024-10-17 23:53:02,622 - lightrag - INFO - [New Chunks] inserting 42 chunks
2024-10-17 23:53:02,622 - lightrag - INFO - Inserting 42 vectors to chunks
2024-10-17 23:53:07,657 - lightrag - INFO - [Entity Extraction]...
2024-10-17 23:55:49,563 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working
2024-10-17 23:55:49,563 - lightrag - WARNING - No new entities and relationships found
2024-10-17 23:55:49,571 - lightrag - INFO - Writing graph with 0 nodes, 0 edges
2024-10-17 23:55:49,601 - lightrag - INFO - Creating a new event loop in a sub-thread.

maxruby commented 6 days ago

got it, thank you, so how do you solve it?

@Christ-dev See my last comment to @LarFii - no luck on my side either.

wensheng commented 5 days ago

Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:

 /set parameter num_ctx 32768

Or change your ollama modelfile. Don't know if specifying in the api call like this: "num_ctx": 32768 works, but you can try. See https://github.com/ollama/ollama/blob/main/docs/faq.md

For me, after today's update, with Qwen 2.5 7B, naive and global search works, but local search returned:

Sorry, I'm not able to provide an answer to that question.

Hybrid search has warning: "Low Level context is None. Return empty Low entity/relationship/source"

LarFii commented 5 days ago

@maxruby I just retested it using Llama 3.1 8b, and it is now running smoothly.

@LarFii

I would love to know what exactly I am doing different from you to NOT get it working with the same code you have in main. I repeated exactly what you mentioned in your last message and ran the LightRAG/examples/lightrag_ollama_demo.py with Llama 3.1 8b with the same negative results I reported in this issue (i.e., no Entities and Relations found):

2024-10-17 23:53:02,216 - lightrag - INFO - Logger initialized for working directory: ./dickens
2024-10-17 23:53:02,216 - lightrag - DEBUG - LightRAG init with param:
  working_dir = ./dickens,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o-mini,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  embedding_func = {'embedding_dim': 768, 'max_token_size': 8192, 'func': <function <lambda> at 0x7f3b35a4e160>},
  embedding_batch_num = 32,
  embedding_func_max_async = 16,
  llm_model_func = <function ollama_model_complete at 0x7f3ad43880e0>,
  llm_model_name = llama3.1:8b,
  llm_model_max_token_size = 32768,
  llm_model_max_async = 16,
  key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
  vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
  vector_db_storage_cls_kwargs = {},
  graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
  enable_llm_cache = True,
  addon_params = {},
  convert_response_to_json_func = <function convert_response_to_json at 0x7f3ad437d8a0>

2024-10-17 23:53:02,216 - lightrag - INFO - Load KV full_docs with 0 data
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV text_chunks with 0 data
2024-10-17 23:53:02,216 - lightrag - INFO - Load KV llm_response_cache with 0 data
2024-10-17 23:53:02,217 - lightrag - INFO - Creating a new event loop in a sub-thread.
2024-10-17 23:53:02,218 - lightrag - INFO - [New Docs] inserting 1 docs
2024-10-17 23:53:02,622 - lightrag - INFO - [New Chunks] inserting 42 chunks
2024-10-17 23:53:02,622 - lightrag - INFO - Inserting 42 vectors to chunks
2024-10-17 23:53:07,657 - lightrag - INFO - [Entity Extraction]...
2024-10-17 23:55:49,563 - lightrag - WARNING - Didn't extract any entities, maybe your LLM is not working
2024-10-17 23:55:49,563 - lightrag - WARNING - No new entities and relationships found
2024-10-17 23:55:49,571 - lightrag - INFO - Writing graph with 0 nodes, 0 edges
2024-10-17 23:55:49,601 - lightrag - INFO - Creating a new event loop in a sub-thread.

Based on the logs, it seems that the previous cache content wasn't cleared, which resulted in the LLM extraction not being triggered again.

HeQinWill commented 5 days ago

Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
 /set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this: "num_ctx": 32768 works, but you can try.

Thank you so much for your suggestion! Your solution worked perfectly.

To share with others how to modify the context length, here’s a step-by-step guide for using Ollama to increase the num_ctx parameter.

Pull the model:
```
ollama pull qwen2
```

Display the model file:

ollama show --modelfile qwen2 > Modelfile

Edit the Modelfile by adding the following line:
```
PARAMETER num_ctx 32768
```
Create the modified model:
```
ollama create -f Modelfile qwen2m
```

This process is not limited to Qwen 2; it's just an example. You can apply similar steps to other models in Ollama.

zaforcan commented 5 days ago

I increased the value of the num_ctx parameter, but this time the model did not fit on my 12 GB graphics card. It's using 15% CPU, 85% GPU and naturally this is very slow. I think local language models are not very suitable for this job. If I'm wrong, please show me the right way

luculli commented 5 days ago

@HeQinWill l increased the value of the num_ctx parameter for qwen2.5:7b and it worked fine on my 12 GB card (RTX 3060). It used about 9GB at run-time. Thanks !!!

44cort44 commented 4 days ago

Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
 /set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this: "num_ctx": 32768 works, but you can try.
Thank you so much for your suggestion! Your solution worked perfectly.

To share with others how to modify the context length, here’s a step-by-step guide for using Ollama to increase the num_ctx parameter.
Pull the model:
ollama pull qwen2
Display the model file:
ollama show --modelfile qwen2 > Modelfile
Edit the Modelfile by adding the following line:
PARAMETER num_ctx 32768
Create the modified model:
ollama create -f Modelfile qwen2m
This process is not limited to Qwen 2; it's just an example. You can apply similar steps to other models in Ollama.

I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each.

Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo.

HeQinWill commented 4 days ago

I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each.

Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo.

Ah, great find! The num_ctx value does need to be adjusted based on individual hardware and local LLM model configurations. I also found that qwen2 and qwen2.5 with 7B models on V100 16GB hardware perform fine, whereas llama3.2 (3B) encounters issues.

BTW, the value of 32768 was simply inherited from the default parameter in LightRAG. https://github.com/HKUDS/LightRAG/blob/e2db7b6c45ac4b48d7026d69b3a770b42bad4dbe/lightrag/lightrag.py#L87

maxruby commented 4 days ago

Only a quick update here (will comment more extensively later). I could finally get the ollama example running with qwen2 after setting the num_ctx parameter to 32768. GPU utilization is over 94% (24 GB VRAM of an A5000) and when looking at the graphml output, it's not entirely clear how to judge the quality of the graphs.

On Sat, Oct 19, 2024, 9:02 AM He Qin @.***> wrote:

I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each.

Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo.

Ah, great find! The num_ctx value does need to be adjusted based on individual hardware and local LLM model configurations. I also found that qwen2 and qwen2.5 with 7B models on V100 16GB hardware perform fine, whereas llama3.2 (3B) encounters issues.

BTW, the value of 32768 was simply inherited from the default parameter in LightRAG. https://github.com/HKUDS/LightRAG/blob/e2db7b6c45ac4b48d7026d69b3a770b42bad4dbe/lightrag/lightrag.py#L87

— Reply to this email directly, view it on GitHub https://github.com/HKUDS/LightRAG/issues/30#issuecomment-2423626382, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJ77TICBDHDI2T5LET7FD3Z4H7ZXAVCNFSM6AAAAABQCLQOAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGYZDMMZYGI . You are receiving this because you were mentioned.Message ID: @.***>

luculli commented 4 days ago

@HeQinWill l increased the value of the num_ctx parameter for qwen2.5:7b and it worked fine on my 12 GB card (RTX 3060). It used about 9GB at run-time. Thanks !!!

UPDATE: I just tried with the model llama3.1-8b and, after setting the num_ctx parameter to 32768, it worked fine. It used 11GB of VRAM.

44cort44 commented 4 days ago

Only a quick update here (will comment more extensively later). I could finally get the ollama example running with qwen2 after setting the num_ctx parameter to 32768. GPU utilization is over 94% (24 GB VRAM of an A5000) and when looking at the graphml output, it's not entirely clear how to judge the quality of the graphs. … On Sat, Oct 19, 2024, 9:02 AM He Qin @.> wrote: I found this same issue running locally on a MacBook. I also found that with this bigger context size, llama3.2 simply didn't identify entities while qwen2.5 did. I verified that the ollama server logs weren't running out of context with each. Additionally, setting the num_ctx value to 8196 was sufficient for the ollama example in the repo. Ah, great find! The num_ctx value does need to be adjusted based on individual hardware and local LLM model configurations. I also found that qwen2 and qwen2.5 with 7B models on V100 16GB hardware perform fine, whereas llama3.2 (3B) encounters issues. BTW, the value of 32768 was simply inherited from the default parameter in LightRAG. https://github.com/HKUDS/LightRAG/blob/e2db7b6c45ac4b48d7026d69b3a770b42bad4dbe/lightrag/lightrag.py#L87 — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJ77TICBDHDI2T5LET7FD3Z4H7ZXAVCNFSM6AAAAABQCLQOAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGYZDMMZYGI . You are receiving this because you were mentioned.Message ID: @.>

To do so visually, use a free tool called "yEd" and open the .graphml file that's created in your working directory.

jacky68147527 commented 4 days ago

the same error!

240839785 commented 4 days ago

I tried qwen2.5:3b-instruct-max-context on my 4080super and it took up 13G of video memory not long after generation started, now I'm limiting num_ctx to 10240

44cort44 commented 4 days ago

I tried qwen2.5:3b-instruct-max-context on my 4080super and it took up 13G of video memory not long after generation started, now I'm limiting num_ctx to 10240

Can you run 'ollama ps' to see that you just don't have other models loaded as well?

ashishreddy2411 commented 4 days ago

I see the same issue today. It would be helpful if you can mention what all changes need to be made in order to work with Ollama models.

240839785 commented 4 days ago

I tried qwen2.5:3b-instruct-max-context on my 4080super and it took up 13G of video memory not long after generation started, now I'm limiting num_ctx to 10240

Can you run 'ollama ps' to see that you just don't have other models loaded as well?

Yes, I checked and it just takes up a huge amount of video memory, maybe because I have so much training material. Also I found an issue where LightRAG would fail to get answers from ollama after a while when num_ntx was set to 10240, so I changed it back to the max value 微信截图_20241020001108

cortseverns commented 4 days ago

Copy the qwen2.5 settings

ollama pull qwen2.5
ollama show --modelfile qwen2.5 > qwen_settings.txt

Add the num_ctx parameter value above the LICENSE

sed '/LICENSE """/i\
PARAMETER num_ctx 8192\
' qwen_settings.txt > your-model-name-settings.txt

Create a new model, curl the corpus, remove old working directory, run the ollama demo

ollama create -f your-model-name-settings.txt your_model_name
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
rm -rfd dickens/
python examples/lightrag_ollama_demo.py

I see the same issue today. It would be helpful if you can mention what all changes need to be made in order to work with Ollama models.

maxruby commented 4 days ago

@HeQinWill

I really appreciate how quickly the team responds to the issue and the comments. Having said that, if I may suggest something here (for the benefit of the project), I think this could be a little more structured and perhaps provide more clear guidelines for the expectations on GPU VRAM consumption and debugging process before half a dozen people start throwing their GPU and time at it. Of course, this is experimental and there are not very realistic alternatives for on-premise local Graph RAG with ollama models, but perhaps it could be organized as a new "feature" Task or Epic/Story as part of the project. Then whoever is interested to contribute could join the Epic/Story or implement the task. My 2 cents.

As for the consumption of GPU VRAM on my server (which actually has 2 x RTX A5000 with 24 GB each), you can see below with nvtop how the GPU consumption peaks to over 90%. In case you wonder, YES I only had the qwen2 model loaded by ollama at that time (ollama ps).

GPU_usage_lightrag_ollama_example_qwen2_num_ctx_32768

@44cort44 Thanks for your quick reply and practical suggestion. I am well familiar with the "yEd" tool and that was not my point in my very brief comment :)

I actually wrote my own python script (see below, actually not very difficult to do with networkx and matplotlib) to visualize the graph and attempt to measure the actual effectiveness of the lightRAG results. I understand that there are multiple approaches to do this and my attempt is simply to understand visually how well the graphs model the entities and relations in the input document. The preliminary results do not easily explain how well the graphs have been built or reflect the original data, that is what I meant to say.

books_qwen2_lightrag_result books_qwen2_lightrag_result_clustered

python script in case you are interested:

import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

# Load the GraphML file
file_path = "../dickens/graph_chunk_entity_relation.graphml"
G = nx.read_graphml(file_path)

# Draw the graph
fig, ax = plt.subplots(figsize=(16, 16))
pos = nx.spring_layout(G, seed=42)  # Positions for all nodes

# Calculate closeness centrality to determine node colors and sizes
closeness = nx.closeness_centrality(G)
norm = plt.Normalize(vmin=min(closeness.values()), vmax=max(closeness.values()))
colors = [plt.cm.Reds(norm(closeness[node])) for node in G.nodes()]
sizes = [2000 * closeness[node] for node in G.nodes()]

# Draw nodes, edges, and labels
nodes = nx.draw_networkx_nodes(G, pos, node_size=sizes, node_color=colors, alpha=0.8, linewidths=0.5, edgecolors='black', ax=ax)
nx.draw_networkx_edges(G, pos, width=0.8, alpha=0.5, edge_color='gray', ax=ax)
nx.draw_networkx_labels(G, pos, font_size=8, font_family='sans-serif', verticalalignment='bottom', horizontalalignment='center', ax=ax)

# Add color bar
sm = plt.cm.ScalarMappable(cmap='Reds', norm=norm)
sm.set_array([])
fig.colorbar(sm, ax=ax, label='Closeness Centrality')

# Display the plot
plt.title("GraphML Visualization")
plt.axis('off')
plt.tight_layout()
plt.show()

HeQinWill commented 3 days ago

@maxruby Just to clarify, I’m not part of the team, just a user who’s faced similar issues and wanted to share my experience. As for my previous response, I’m not sure if it fully addresses your concern.
I believe that with the efforts of the development team and the community, we’ll continue to make this more efficient and accessible.

LarFii commented 3 days ago

Thank you all for your contributions. We will continue working hard to make LightRAG better : )

maxruby commented 3 days ago

@LarFii Thank you for the work on this project! Do you know if there is a plan for what new features will be developed and how they might be prioritized in lightRAG ? For example, it would be nice to know what the team already has in mind in or is working on for further development with local Embedding Models. Could this be followed up in a Discussion?

PromtEngineer commented 1 day ago

I created a detailed video tutorial on how to get LightRAG working with Ollama based on the tips shared here. Here is the link if anyone is running into issues:

EvelynBai commented 1 day ago

Your context window is probably too small. Ollama by default only have 2k. To increase it, in ollama do:
 /set parameter num_ctx 32768
Or change your ollama modelfile. Don't know if specifying in the api call like this: "num_ctx": 32768 works, but you can try. See https://github.com/ollama/ollama/blob/main/docs/faq.md

For me, after today's update, with Qwen 2.5 7B, naive and global search works, but local search returned:
Sorry, I'm not able to provide an answer to that question.
Hybrid search has warning: "Low Level context is None. Return empty Low entity/relationship/source"

Same issue(I failed in local&global&hybrid search...) after running lightrag_ollama_demo.py, any idea how to solve this? Thank you! For local search I got: Sorry, I'm not able to provide an answer to that question. For global search I got: I'm sorry, but I need you to provide the story or the text first so that I can analyze it and determine the top themes. Once you share the content, I'll be able to identify the main themes for you. For hybrid search I got: To provide an accurate response, I need the text of the story you're referring to. Please share the story or key parts of it, and I will help identify the main themes. The graph construction step seemed to be successful.

rcontesti commented 8 hours ago

Hi @maxruby, have you been succesful to run llama3.2:3b? I'm still scratching my head about config to fine tune. Thank you!

maxruby commented 37 minutes ago

Hi @maxruby, have you been succesful to run llama3.2:3b? I'm still scratching my head about config to fine tune. Thank you!

I did not try again, but I suppose if you adjust the parameter num_ct to 32768 it should probably work as it does for qwen2.

HKUDS / LightRAG

Entity Extraction Failure: No Entities or Relationships Extracted with ollama models #30