240839785 commented 3 weeks ago

Description: When using LightRAG with a local ollama service, the responses stop coming in after running for about 1 hour or sometimes less. Upon stopping LightRAG, the ollama service logs the following warning repeatedly: time=2024-10-20T10:28:14,237+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.123168046 model=/root/.ollama/models/blobs/sha256-5ee4f07cdhgbeadbbb293e85803c56gbolbd37ed059d2715fa7bb405f3lcaa time=2024-10-20T10:28:14.488+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.373610908 model=/root/.ollama/models/blobs/sha256-5ee4f07cdbgbeadbbb293e85803c569bolbd37ed059d2715faa7bb405f3lcaa6IlA time=2024-10-20T10:28:14.737+08:00 level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.623228477 model=/root/.ollama/models/blobs/sha256-5ee4f07cdbcbeadbbb293e85803c56gb01bd37ed059d2715faa7bb405f3lcaa6 Here's my lightrag_ollama_demo.py: `import os

from lightrag import LightRAG, QueryParam from lightrag.llm import ollama_model_complete, ollama_embedding from lightrag.utils import EmbeddingFunc

WORKING_DIR = "./dickens" TEXT_FILES_DIR = "/llm/mt"

if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR)

rag = LightRAG( working_dir=WORKING_DIR, llm_model_func=ollama_model_complete, llm_model_name="qwen2.5:3b-instruct-max-context", embedding_func=EmbeddingFunc( embedding_dim=768, max_token_size=8192, func=lambda texts: ollama_embedding(texts, embed_model="nomic-embed-text"), ), )

读取 TEXT_FILES_DIR 目录下所有的 .txt 文件

texts = [] for filename in os.listdir(TEXT_FILES_DIR): if filename.endswith('.txt'): file_path = os.path.join(TEXT_FILES_DIR, filename) with open(file_path, 'r', encoding='utf-8') as file: texts.append(file.read())

批量插入文本到 LightRAG

rag.insert(texts)

with open("./pc.txt") as f:

rag.insert(f.read())

Perform naive search

print( rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")) )

Perform local search

print( rag.query("What are the top themes in this story?", param=QueryParam(mode="local")) )

Perform global search

print( rag.query("What are the top themes in this story?", param=QueryParam(mode="global")) )

Perform hybrid search

print( rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid")) )`

LarFii commented 3 weeks ago

We add a demo vram_management_demo.py in examples\, you can try it.