HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.22k stars 1.13k forks source link

embedding dimension problem #270

Open 13331112522 opened 1 week ago

13331112522 commented 1 week ago

I met the embedding dimension error when I employed the open source embedding model and openai-like one. I changed the dimension from default 4096 to 1024 or 2048, I got this error:

File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/nano_vectordb/dbs.py", line 71, in __post_init__ storage["embedding_dim"] == self.embedding_dim AssertionError: Embedding dim mismatch, expected: 1024, but loaded: 4096

When I didn't change the dimension and used: embedding_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

I got something wrong with:

INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'text'] INFO:lightrag:Logger initialized for working directory: ./dickens INFO:lightrag:Load KV full_docs with 0 data INFO:lightrag:Load KV text_chunks with 0 data INFO:lightrag:Load KV llm_response_cache with 0 data INFO:lightrag:Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges INFO:nano-vectordb:Load (0, 4096) data INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_entities.json'} 0 data INFO:nano-vectordb:Load (0, 4096) data INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_relationships.json'} 0 data INFO:nano-vectordb:Load (0, 4096) data INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_chunks.json'} 42 data INFO:lightrag:[New Docs] inserting 1 docs INFO:lightrag:[New Chunks] inserting 42 chunks INFO:lightrag:Inserting 42 vectors to chunks Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 1.20it/s] ...

INFO:lightrag:Writing graph with 0 nodes, 0 edges Traceback (most recent call last): File "/Users/zhouql1978/dev/LightRAG/test.py", line 67, in rag.insert(f.read()) File "/Users/zhouql1978/dev/LightRAG/lightrag/lightrag.py", line 197, in insert return loop.run_until_complete(self.ainsert(string_or_strings)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/zhouql1978/dev/LightRAG/lightrag/lightrag.py", line 241, in ainsert await self.chunks_vdb.upsert(inserting_chunks) File "/Users/zhouql1978/dev/LightRAG/lightrag/storage.py", line 98, in upsert results = self._client.upsert(datas=list_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/nano_vectordb/dbs.py", line 92, in upsert self.__storage["matrix"][i] = update_d[f_VECTOR].astype(Float)


IndexError: index 0 is out of bounds for axis 0 with size 0

Here's my code:

embedding_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

embedding_func = EmbeddingFunc(
        embedding_dim=4096,
        max_token_size=8192,
        func=lambda texts:embedding_model.aget_text_embedding_batch(texts),
    )
rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=llm_model_func,
    embedding_func=embedding_func
)
LarFii commented 2 days ago

You can reference this issue #34, maybe can help.