Open hjing100 opened 1 week ago
INF0 :light rag:Writing graph with 0 nodes, 0 edges Traceback (most recent call last): File "/data/projects/rag/LightRAG-main/test01/example08 lightrag.py", line 85, in {<}module{>} rag.insert (f .read( )) File "/data/projects/rag/LightRAG-main/light rag/lightrag .py", line 167, in insert retum loop.mn until complete(self.ainsert(st ring or st rings) ) File "/data/anaccnda3/envs/test39/lib/pythan3.9/asyncio/base events.py", line 642, in run ntil complete retum future. result() File "/data/projects/rag/ightRAG-main/light rag/lightrag .py", line 211, in ainsert await self.chunks vdb.upsert (inserting chunks) File "/data/projects/rag/ightRAG-main/light rag/storage.py", line 100, in upsert results =self. client .upsert(datas-list data) File "/data/anaconda3/envs/test39/lib/pythan3.9/site-packages/nano vectordb/dbs.py", line 98, in upsert self. storage["matrix"l= np.vstack([self. storage["matrix"l, new matrix]) File "/data/anacanda3/envs/test39/lib/pythan3.9/site-packages/numpy/core/shape base.py", line 289, in vstack retum nx.concatenate(arrs, 0, dtype dtype, casting=casting) ValueError: all the input arrey dimensions except for the concatenation axis must match exactly, but along dimension 1, the arrey at index 0 has size 1536 and the array at index 1 has size 768
难道是因为utils.py做了以下改动:ENCODER = AutoTokenizer.from_pretrained( "/data/qwen2-72b-instruct", device_map="auto", trust_remote_code=True )
def encode_string_by_tiktoken(content: str, model_name: str = "gpt-4o"): global ENCODER
model_inputs = ENCODER(content)
input_ids = model_inputs.input_ids
tokens = input_ids
return tokens
def decode_tokens_by_tiktoken(tokens: list[int], model_name: str = "gpt-4o"): global ENCODER
tokens = torch.IntTensor(tokens)
content = ENCODER.decode(tokens)
return content
运行代码是: import os import numpy as np from lightrag import LightRAG, QueryParam
from lightrag.utils import EmbeddingFunc from lightrag.llm import hf_model_complete, hf_embedding from transformers import AutoModel, AutoTokenizer import time
WORKING_DIR = "./dickens"
if not os.path.exists(WORKING_DIR): os.mkdir(WORKING_DIR)
rag = LightRAG( working_dir=WORKING_DIR, llm_model_func=hf_model_complete, # Use Hugging Face model for text generation llm_model_name='/data/qwen2-72b-instruct', # Model name from Hugging Face
embedding_func=EmbeddingFunc(
embedding_dim=1536,
max_token_size=8192,
func=lambda texts: hf_embedding(
texts,
tokenizer=AutoTokenizer.from_pretrained("/data/bce-embedding-base_v1"),
embed_model=AutoModel.from_pretrained("/data/bce-embedding-base_v1")
)
),
)
with open(r"D:\code\LightRAG-main\book.txt","r",encoding="utf-8") as f: rag.insert(f.read())
query = "这篇文章主要讲了些什么?" print(rag.query(query, param=QueryParam(mode="global"))) print("***") time.sleep(2)
use another embedding model,like nomic-embed-text,don't forget mkdir a new workdir ,it works for me
use another embedding model,like nomic-embed-text,don't forget mkdir a new workdir ,it works for me
Thank you, and it's not work for me. bce-embedding-base_v1 dimensions is 768, so chang: lightrag.py node2vec_params: dict = field( default_factory=lambda: { "dimensions": 768, # 1536, "num_walks": 10, "walk_length": 40, "window_size": 2, "iterations": 3, "random_seed": 3, } )
lightrag_hf_demo.py
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=hf_model_complete,
llm_model_name='/data/qwen2-72b-instruct',
embedding_func=EmbeddingFunc(
embedding_dim=768, # 1536,
max_token_size=8192,
func=lambda texts: hf_embedding(
texts,
tokenizer=AutoTokenizer.from_pretrained("/data/bce-embedding-base_v1"),
embed_model=AutoModel.from_pretrained("/data/bce-embedding-base_v1")
)
),
)
have another question: 224-11-11 10:37:23.711056: W extemal/acal xla/xla/tsl/ib/manitoring/collecticn registry.cc:88] Trying to register 2 metrics with the same rame egister a new ane. Please check if you link the metric more than once, or if the name is already used by other metrics. INF0 :light rag:Writing graph with 0 nodes, 0 edges /tensorflow/api/tf function. The old value will be erased raceback(most recent call last): File "/data/projects/rag/LightRAG-main/test02/example08_lightrag.py", line 85, in {<}module{>} rag. insert ( f . read( )) File "/data/projects/rag/ightRAG-main/light raglightrag .py", line 167, in insert retum loop.run until complete(self.ainsert(string or st rings) ) File "/data/anacanda3/envs/test39/lib/pythan3.9/asyncio/base events.py", line 642, in run until complete retum future. result() File "/data/projects/rag/LightRAG-main/light rag/lightrag .py", line 214, in ainsert maybe new kg = await extract entities( File "/data/projects/rag/LightRAG-main/light rag/cperate.py", line 33l, in extract entities results =await asycio.cather( in order to File "/data/projects/rag/LightRAG-main/l ight rag/cperate.py", line 270, in _process single content final result = await use llm func(hint prompt) File "/data/projects/rag/LightRAG-main/light ragutils.py", line 92, in wait func result = await func(args, :.kwargs) File "/data/projects/rag/LightRAG-main/lightrag/llm.py", line 510, in hf_model_complete retum await hf_model_if_cache( File "/data/proiects/rag/LightRAG-main/lightrag/llm.py", line 286, in hf_model_if_cache output = hf_model.generate( File "/data/anacanda3/envs/test39/lib/pythan3.9/site-packages/torchvutils/ cantextlib.py", line 115, in decorate cantext retum func(args, **kwargs)
...
File "/data/anacanda3/envs/test39/ib/pythan3.9/typing .py", line 215,in remve dups flatten all params = set (params) TypeError: urhashable type:'list'
排查出原因是,是因为安装了tensorflow,卸载了就好了。 刚开始以为是我的改动导致代码里哪里有点问题,后来运行qwen推理发现也有同样的问题,才发现是import tensorflow就会报改错。 而且LightRAG外网环境没有tensorflow包的需要(tensorflow安装是之前别的项目需要): (LightRAG) C:\Users\lenovo>pip list Package Version Editable project location
accelerate 1.1.1 aioboto3 13.2.0 aiobotocore 2.15.2 aiofiles 24.1.0 aiohappyeyeballs 2.4.3 aiohttp 3.10.10 aioitertools 0.12.0 aiosignal 1.3.1 annotated-types 0.7.0 anyio 4.6.2.post1 anytree 2.12.1 asttokens 2.4.1 async-timeout 4.0.3 attrs 24.2.0 autograd 1.7.0 beartype 0.18.5 boto3 1.35.36 botocore 1.35.36 certifi 2024.8.30 charset-normalizer 3.4.0 colorama 0.4.6 contourpy 1.3.0 cycler 0.12.1 decorator 5.1.1 distro 1.9.0 exceptiongroup 1.2.2 executing 2.1.0 filelock 3.16.1 fonttools 4.54.1 frozenlist 1.5.0 fsspec 2024.10.0 gensim 4.3.3 graspologic 3.4.1 graspologic-native 1.2.1 h11 0.14.0 hnswlib 0.8.0 httpcore 1.0.6 httpx 0.27.2 huggingface-hub 0.26.2 hyppo 0.4.0 idna 3.10 importlib_resources 6.4.5 ipython 8.18.1 jedi 0.19.1 Jinja2 3.1.4 jiter 0.7.0 jmespath 1.0.1 joblib 1.4.2 jsonpickle 3.4.2 kiwisolver 1.4.7 lightrag-hku 0.0.8 d:\code\lightrag-main llvmlite 0.43.0 MarkupSafe 3.0.2 matplotlib 3.9.2 matplotlib-inline 0.1.7 mpmath 1.3.0 multidict 6.1.0 nano-vectordb 0.0.4.1 networkx 3.2.1 numba 0.60.0 numpy 1.26.4 ollama 0.3.3 openai 1.54.3 packaging 24.1 pandas 2.2.3 parso 0.8.4 patsy 0.5.6 pillow 11.0.0 pip 24.2 POT 0.9.5 prompt_toolkit 3.0.48 propcache 0.2.0 psutil 6.1.0 pure_eval 0.2.3 pydantic 2.9.2 pydantic_core 2.23.4 Pygments 2.18.0 pynndescent 0.5.13 pyparsing 3.2.0 python-dateutil 2.9.0.post0 pytz 2024.2 pyvis 0.3.2 PyYAML 6.0.2 regex 2024.11.6 requests 2.32.3 s3transfer 0.10.3 safetensors 0.4.5 scikit-learn 1.5.2 scipy 1.12.0 seaborn 0.13.2 setuptools 75.1.0 six 1.16.0 smart-open 7.0.5 sniffio 1.3.1 stack-data 0.6.3 statsmodels 0.14.4 sympy 1.13.1 tenacity 9.0.0 threadpoolctl 3.5.0 tiktoken 0.8.0 tokenizers 0.20.3 torch 2.5.1 tqdm 4.67.0 traitlets 5.14.3 transformers 4.46.2 typing_extensions 4.12.2 tzdata 2024.2 umap-learn 0.5.7 urllib3 1.26.20 wcwidth 0.2.13 wheel 0.44.0 wrapt 1.16.0 xxhash 3.5.0 yarl 1.17.1 zipp 3.20.2
(LightRAG) C:\Users\lenovo>
你好,还是有以上报错。