HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.26k stars 1.14k forks source link

Unstable context on each start #255

Closed Burakabdi closed 3 days ago

Burakabdi commented 1 week ago

Issue Description

I created LightRAG using OpenAI. I am able to query and retrieve context. However I realized that context is unstable (still relevant to query) for same query on each start of LightRAG within my app. It slightly changes (eg. single relationship or single entity) with exact same query params. I used dunzhang/stella_en_400M_v5 embedding model during creation and querying. Note: My app also uses async in server level. Note: LLM params for hl and ll keyword extraction is constant and I checked keywords on different app starts and keywords are constant for same query.

Environment

pypandoc==1.13 boto3==1.35.32 pydantic_core==2.23.4 pydantic==2.9.2 passlib==1.7.4 numpy==1.26.4 pandas==2.2.1 python_dateutil==2.8.2 pytz==2024.1 Requests==2.31.0 text_generation==0.6.1 faiss-cpu==1.7.4 sutime==1.0.1 fuzzywuzzy==0.18.0 transformers==4.37.2 haversine pyarrow cacheout termcolor scikit-learn regex nltk lightrag-hku==0.0.8 aioboto3==13.2.0 ollama==0.3.3 nano-vectordb==0.0.4.1 openai openai>=0.27.0 neo4j>=5.7.0 pybind11>=2.10.0 torch>=1.13.1 tiktoken>=0.3.0 networkx>=3.0 scipy>=1.10.1 spacy>=3.5.2 py2neo>=2021.2.3 nest-asyncio>=1.5.6

LightRAG Settings:

params = QueryParam(
    mode="hybrid", 
    only_need_context=True, 
    top_k=3, 
    max_token_for_text_unit=800,
    max_token_for_global_context=400,
    max_token_for_local_context=400
)

node2vec_params = {
    'dimensions': 1024, 
    'num_walks': 10, 
    'walk_length': 40, 
    'window_size': 2, 
    'iterations': 3, 
    'random_seed': 3
}

LightRAG(
      working_dir=working_dir,
      llm_model_func=hf_model_complete,
      llm_model_name="meta-llama/Meta-Llama-3.1-8B-Instruct",
      llm_model_max_token_size=4500,
      node2vec_params=node2vec_params,
      embedding_func=EmbeddingFunc(
          embedding_dim=1024,
          max_token_size=8192,
          func=lambda texts: embedding_func(texts),
      )
)

Any help would be appreciated

LarFii commented 6 days ago

I have fixed the bug. You can download the latest code and give it a try (no need to re-insert).