TheAiSingularity / graphrag-local-ollama

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction
MIT License
728 stars 107 forks source link

ZeroDivisionError: Weights sum to zero, can't be normalized #15

Open hp0404 opened 3 months ago

hp0404 commented 3 months ago

I encountered the following problem while running a local search from Google Colab:

command:

python -m graphrag.query --root ./ragtest --method local \
"What are the differences between Russian, Western, and North Korean approaches to war?"

output:

2024-07-11 16:23:41.000800: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 16:23:41.000856: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 16:23:41.002564: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 16:23:42.205100: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_chat", 'model': 'gemma2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_embedding", 'model': 'nomic_embed_text', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/api', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
Error embedding chunk {'OpenAIEmbedding': 'Error code: 404 - {'error': "model 'nomic_embed_text' not found, try pulling it first"}'}
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/__main__.py", line 76, in <module>
    run_local_search(
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/structured_search/local_search/search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "/content/graphrag-ways-of-war/graphrag-local-ollama/graphrag/query/llm/oai/embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

now, when running a global search, it responds without errors:

INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_chat", 'model': 'gemma2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}

SUCCESS: Global Search Response: I am sorry but I am unable to answer this question given the provided data.

The entire indexing pipeline ran without issues, too (I pulled the 'nomic_embed_text' model, and it seems to work? I'm not sure what the error message is about).

config:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: gemma2
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic_embed_text
    api_base: http://localhost:11434/api
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.md$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: yes
  raw_entities: yes
  top_level_nodes: yes

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32
davidallcott commented 3 months ago

I humbly submit my heinous hack. I suspect when local instead of global is used, some lists of all zero weight elements come under consideration. The attempt to normalize their zero weight contributions fails and the np.average ZeroDivisionError occurs. I put a try block in the embedding.py code like this:

Daves hack to avoid divide by zero

    try:
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
    except ZeroDivisionError:
        return [] # return empty list
    return chunk_embeddings.tolist() # would always work in global

Seems to work.

wenwkich commented 3 months ago

I humbly submit my heinous hack. I suspect when local instead of global is used, some lists of all zero weight elements come under consideration. The attempt to normalize their zero weight contributions fails and the np.average ZeroDivisionError occurs. I put a try block in the embedding.py code like this:

Daves hack to avoid divide by zero

    try:
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
    except ZeroDivisionError:
        return [] # return empty list
    return chunk_embeddings.tolist() # would always work in global

Seems to work.

This seems to generate completely out of context response