I met the embedding dimension error when I employed the open source embedding model and openai-like one. I changed the dimension from default 4096 to 1024 or 2048, I got this error:
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/nano_vectordb/dbs.py", line 71, in __post_init__
storage["embedding_dim"] == self.embedding_dim
AssertionError: Embedding dim mismatch, expected: 1024, but loaded: 4096
When I didn't change the dimension and used:
embedding_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
I got something wrong with:
INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'text']
INFO:lightrag:Logger initialized for working directory: ./dickens
INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:lightrag:Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges
INFO:nano-vectordb:Load (0, 4096) data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_entities.json'} 0 data
INFO:nano-vectordb:Load (0, 4096) data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Load (0, 4096) data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_chunks.json'} 42 data
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 42 chunks
INFO:lightrag:Inserting 42 vectors to chunks
Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 1.20it/s]
...
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/Users/zhouql1978/dev/LightRAG/test.py", line 67, in
rag.insert(f.read())
File "/Users/zhouql1978/dev/LightRAG/lightrag/lightrag.py", line 197, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/zhouql1978/dev/LightRAG/lightrag/lightrag.py", line 241, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/Users/zhouql1978/dev/LightRAG/lightrag/storage.py", line 98, in upsert
results = self._client.upsert(datas=list_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/nano_vectordb/dbs.py", line 92, in upsert
self.__storage["matrix"][i] = update_d[f_VECTOR].astype(Float)
IndexError: index 0 is out of bounds for axis 0 with size 0
Here's my code:
embedding_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
embedding_func = EmbeddingFunc(
embedding_dim=4096,
max_token_size=8192,
func=lambda texts:embedding_model.aget_text_embedding_batch(texts),
)
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=llm_model_func,
embedding_func=embedding_func
)
I met the embedding dimension error when I employed the open source embedding model and openai-like one. I changed the dimension from default 4096 to 1024 or 2048, I got this error:
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/nano_vectordb/dbs.py", line 71, in __post_init__ storage["embedding_dim"] == self.embedding_dim AssertionError: Embedding dim mismatch, expected: 1024, but loaded: 4096
When I didn't change the dimension and used: embedding_model=HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
I got something wrong with:
INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'text'] INFO:lightrag:Logger initialized for working directory: ./dickens INFO:lightrag:Load KV full_docs with 0 data INFO:lightrag:Load KV text_chunks with 0 data INFO:lightrag:Load KV llm_response_cache with 0 data INFO:lightrag:Loaded graph from ./dickens/graph_chunk_entity_relation.graphml with 0 nodes, 0 edges INFO:nano-vectordb:Load (0, 4096) data INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_entities.json'} 0 data INFO:nano-vectordb:Load (0, 4096) data INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_relationships.json'} 0 data INFO:nano-vectordb:Load (0, 4096) data INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': './dickens/vdb_chunks.json'} 42 data INFO:lightrag:[New Docs] inserting 1 docs INFO:lightrag:[New Chunks] inserting 42 chunks INFO:lightrag:Inserting 42 vectors to chunks Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 1.20it/s] ...
INFO:lightrag:Writing graph with 0 nodes, 0 edges Traceback (most recent call last): File "/Users/zhouql1978/dev/LightRAG/test.py", line 67, in
rag.insert(f.read())
File "/Users/zhouql1978/dev/LightRAG/lightrag/lightrag.py", line 197, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/zhouql1978/dev/LightRAG/lightrag/lightrag.py", line 241, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/Users/zhouql1978/dev/LightRAG/lightrag/storage.py", line 98, in upsert
results = self._client.upsert(datas=list_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/nano_vectordb/dbs.py", line 92, in upsert
self.__storage["matrix"][i] = update_d[f_VECTOR].astype(Float)