HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.64k stars 1.2k forks source link

Embedding dim bug when using ollama models #170

Closed whut265107 closed 3 weeks ago

whut265107 commented 3 weeks ago

When llm_model_name is set to 'qwen2.5:7b', the code is as follows:

from lightrag.llm import ollama_model_complete, ollama_embedding

# Initialize LightRAG with Ollama model
rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=ollama_model_complete,  # Use Ollama model for text generation
    llm_model_name='qwen2.5:7b',  # Your model name
    # Use Ollama embedding function
    embedding_func=EmbeddingFunc(
        embedding_dim=768,
        max_token_size=8192,
        func=lambda texts: ollama_embedding(
            texts,
            embed_model="nomic-embed-text"
        )
    ),
)

The following bug will occur:

image

And when embedding_dim is set to 1536, the code is as follows:

from lightrag.llm import ollama_model_complete, ollama_embedding

# Initialize LightRAG with Ollama model
rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=ollama_model_complete,  # Use Ollama model for text generation
    llm_model_name='qwen2.5:7b',  # Your model name
    # Use Ollama embedding function
    embedding_func=EmbeddingFunc(
        embedding_dim=1536,
        max_token_size=8192,
        func=lambda texts: ollama_embedding(
            texts,
            embed_model="nomic-embed-text"
        )
    ),
)

with open("./book.txt") as f:
    rag.insert(f.read())

The following bug will occur:

image

zimbo202 commented 3 weeks ago

Same issue, when using llama3.1:8b and set embedding_dim=4096, the 'Embedding dim mismatch' still occurs once, and the second error occurs when the example is re-run.

LarFii commented 3 weeks ago

Please check if there are any cached files in the WORKING_DIR. If so, you’ll need to clear them and rerun the example.