TheAiSingularity / graphrag-local-ollama

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction
MIT License
577 stars 79 forks source link

The model can't answer #23

Open ccpowe opened 1 month ago

ccpowe commented 1 month ago

(graphrag-ollama-local) root@autodl-container-49d843b6cc-10e9e2a3:~/graphrag-local-ollama# python -m graphrag.query --root ./ragtest --method global "What is machinelearning?"

INFO: Reading settings from ragtest/settings.yaml creating llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_chat", 'model': 'mistral', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25} Error parsing search response json Traceback (most recent call last): File "/root/graphrag-local-ollama/graphrag/query/structured_search/global_search/search.py", line 194, in _map_response_single_batch processed_response = self.parse_search_response(search_response) File "/root/graphrag-local-ollama/graphrag/query/structured_search/global_search/search.py", line 232, in parse_search_response parsed_elements = json.loads(search_response)["points"] File "/root/miniconda3/envs/graphrag-ollama-local/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/root/miniconda3/envs/graphrag-ollama-local/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/root/miniconda3/envs/graphrag-ollama-local/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

SUCCESS: Global Search Response: I am sorry but I am unable to answer this question given the provided data.

yeahdongcn commented 1 month ago

I encountered the same issue here.

yaowu95 commented 1 month ago

I encountered the same issue here.

CarloC-AB commented 1 month ago

I also get this following all the steps on the readme

yeahdongcn commented 1 month ago

Looks like the search_response is already the final answer. This is from print(search_response):

Machine learning (ML) is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves.Machine learning algorithms are designed to parse data, learn from it, and then make predictions or decisions without being specifically programmed to perform the task. There are three main types of machine learning: supervised learning (where the model is trained on a labeled dataset), unsupervised learning (where the model learns patterns in an unlabeled dataset), and reinforcement learning (where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward).Machine learning has numerous applications, including image recognition, natural language processing, recommendation systems, fraud detection, and many more. It's a rapidly growing field with significant impact on various industries such as healthcare, finance, retail, and technology.
SirFalk commented 1 month ago

I have the same issue, for me, the community summaries are not getting created correctly, which is probably the root issue, as it cannot read a json that was not created. Under graphrag-local-ollama/ragtest/output/20240718-164303/reports I get a logs.json:

{"type": "error", "data": "Community Report Extraction Error", "stack": "Traceback (most recent call last):\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py\", line 58, in __call__\n    await self._llm(\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/json_parsing_llm.py\", line 34, in __call__\n    result = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_token_replacing_llm.py\", line 37, in __call__\n    return await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_history_tracking_llm.py\", line 33, in __call__\n    output = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/caching_llm.py\", line 104, in __call__\n    result = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 177, in __call__\n    result, start = await execute_with_retry()\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 159, in execute_with_retry\n    async for attempt in retryer:\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py\", line 166, in __anext__\n    do = await self.iter(retry_state=self._retry_state)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py\", line 153, in iter\n    result = await action(retry_state)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/_utils.py\", line 99, in inner\n    return call(*args, **kwargs)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/__init__.py\", line 398, in <lambda>\n    self._add_action_func(lambda rs: rs.outcome.result())\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/concurrent/futures/_base.py\", line 451, in result\n    return self.__get_result()\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/concurrent/futures/_base.py\", line 403, in __get_result\n    raise self._exception\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 165, in execute_with_retry\n    return await do_attempt(), start\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 147, in do_attempt\n    return await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/base_llm.py\", line 48, in __call__\n    return await self._invoke_json(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_chat_llm.py\", line 90, in _invoke_json\n    raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)\nRuntimeError: Failed to generate valid JSON output\n", "source": "Failed to generate valid JSON output", "details": null}
{"type": "error", "data": "Community Report Extraction Error", "stack": "Traceback (most recent call last):\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py\", line 58, in __call__\n    await self._llm(\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/json_parsing_llm.py\", line 34, in __call__\n    result = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_token_replacing_llm.py\", line 37, in __call__\n    return await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_history_tracking_llm.py\", line 33, in __call__\n    output = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/caching_llm.py\", line 104, in __call__\n    result = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 177, in __call__\n    result, start = await execute_with_retry()\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 159, in execute_with_retry\n    async for attempt in retryer:\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py\", line 166, in __anext__\n    do = await self.iter(retry_state=self._retry_state)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py\", line 153, in iter\n    result = await action(retry_state)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/_utils.py\", line 99, in inner\n    return call(*args, **kwargs)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/__init__.py\", line 398, in <lambda>\n    self._add_action_func(lambda rs: rs.outcome.result())\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/concurrent/futures/_base.py\", line 451, in result\n    return self.__get_result()\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/concurrent/futures/_base.py\", line 403, in __get_result\n    raise self._exception\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 165, in execute_with_retry\n    return await do_attempt(), start\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 147, in do_attempt\n    return await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/base_llm.py\", line 48, in __call__\n    return await self._invoke_json(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_chat_llm.py\", line 90, in _invoke_json\n    raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)\nRuntimeError: Failed to generate valid JSON output\n", "source": "Failed to generate valid JSON output", "details": null}
{"type": "error", "data": "Community Report Extraction Error", "stack": "Traceback (most recent call last):\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/index/graph/extractors/community_reports/community_reports_extractor.py\", line 58, in __call__\n    await self._llm(\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/json_parsing_llm.py\", line 34, in __call__\n    result = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_token_replacing_llm.py\", line 37, in __call__\n    return await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_history_tracking_llm.py\", line 33, in __call__\n    output = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/caching_llm.py\", line 104, in __call__\n    result = await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 177, in __call__\n    result, start = await execute_with_retry()\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 159, in execute_with_retry\n    async for attempt in retryer:\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py\", line 166, in __anext__\n    do = await self.iter(retry_state=self._retry_state)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py\", line 153, in iter\n    result = await action(retry_state)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/_utils.py\", line 99, in inner\n    return call(*args, **kwargs)\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/tenacity/__init__.py\", line 398, in <lambda>\n    self._add_action_func(lambda rs: rs.outcome.result())\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/concurrent/futures/_base.py\", line 451, in result\n    return self.__get_result()\n  File \"/home/work/miniconda3/envs/graphrag-ollama-local/lib/python3.10/concurrent/futures/_base.py\", line 403, in __get_result\n    raise self._exception\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 165, in execute_with_retry\n    return await do_attempt(), start\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/rate_limiting_llm.py\", line 147, in do_attempt\n    return await self._delegate(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/base/base_llm.py\", line 48, in __call__\n    return await self._invoke_json(input, **kwargs)\n  File \"/home/work/Code/Python/graphrag-local-ollama/graphrag/llm/openai/openai_chat_llm.py\", line 90, in _invoke_json\n    raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)\nRuntimeError: Failed to generate valid JSON output\n", "source": "Failed to generate valid JSON output", "details": null}
SirFalk commented 1 month ago

I was able to fix it by utilizing mistral instead of llama3, as mistral supports json mode

stephcurt commented 1 month ago

I was able to fix it by utilizing mistral instead of llama3, as mistral supports json mode

I had to use this for another issue.

yeahdongcn commented 1 month ago

I was able to fix it by utilizing mistral instead of llama3, as mistral supports json mode

I just tried again with mistral but still facing the same error. Could you share your local change? Thanks.

gy850222 commented 1 month ago

I was able to fix it by utilizing mistral instead of llama3, as mistral supports json mode

I just tried again with mistral but still facing the same error. Could you share your local change? Thanks.

yes me too!

SirFalk commented 1 month ago

I was able to fix it by utilizing mistral instead of llama3, as mistral supports json mode

I just tried again with mistral but still facing the same error. Could you share your local change? Thanks.

Did you make sure to delete the cache and output before retrying?

yeahdongcn commented 1 month ago

I was able to fix it by utilizing mistral instead of llama3, as mistral supports json mode

I just tried again with mistral but still facing the same error. Could you share your local change? Thanks.

Did you make sure to delete the cache and output before retrying?

Yes. I recreated ragtest directory.

slutifsh99 commented 1 month ago

same issue, can't run the demo

gaostar123 commented 1 month ago
I can't run the demo too

Traceback (most recent call last): File "/home/gaoshida/gsd/graphrag-local-ollama/graphrag/query/structured_search/global_search/search.py", line 194, in _map_response_single_batch processed_response = self.parse_search_response(search_response) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gaoshida/gsd/graphrag-local-ollama/graphrag/query/structured_search/global_search/search.py", line 232, in parse_search_response parsed_elements = json.loads(search_response)["points"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gaoshida/anaconda3/envs/ollama/lib/python3.11/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gaoshida/anaconda3/envs/ollama/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gaoshida/anaconda3/envs/ollama/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

SUCCESS: Global Search Response: I am sorry but I am unable to answer this question given the provided data.

9prodhi commented 1 month ago

The demo is not working for me as well, getting the below error:

image
jaiden-lee commented 1 month ago

I'm getting the same error. I'm running Mistral and nomic-embed-text locally via llama-cpp-python. One thing I noticed is that, on my embedding inference server, I'm getting this error, which I think has to do with the above error:

Exception: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [12840, 374, 4876, 4193, 30]}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 0), 'msg': 'Input should be a valid string', 'input': 12840}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 1), 'msg': 'Input should be a valid string', 'input': 374}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 2), 'msg': 'Input should be a valid string', 'input': 4876}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 3), 'msg': 'Input should be a valid string', 'input': 4193}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 4), 'msg': 'Input should be a valid string', 'input': 30}]
Traceback (most recent call last):
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 315, in app
    raise validation_error
fastapi.exceptions.RequestValidationError: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [12840, 374, 4876, 4193, 30]}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 0), 'msg': 'Input should be a valid string', 'input': 12840}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 1), 'msg': 'Input should be a valid string', 'input': 374}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 2), 'msg': 'Input should be a valid string', 'input': 4876}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 3), 'msg': 'Input should be a valid string', 'input': 4193}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 4), 'msg': 'Input should be a valid string', 'input': 30}]

Here is an image for better view: image

It seems that an embedding is being sent to the embedding model for some reason, which is not a valid input.

Here is the embeddings portion of the settings.yml just in case if that helps:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: http://localhost:8081/v1
    # api_version: v1
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
yeahdongcn commented 1 month ago

I'm getting the same error. I'm running Mistral and nomic-embed-text locally via llama-cpp-python. One thing I noticed is that, on my embedding inference server, I'm getting this error, which I think has to do with the above error:

Exception: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [12840, 374, 4876, 4193, 30]}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 0), 'msg': 'Input should be a valid string', 'input': 12840}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 1), 'msg': 'Input should be a valid string', 'input': 374}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 2), 'msg': 'Input should be a valid string', 'input': 4876}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 3), 'msg': 'Input should be a valid string', 'input': 4193}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 4), 'msg': 'Input should be a valid string', 'input': 30}]
Traceback (most recent call last):
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 315, in app
    raise validation_error
fastapi.exceptions.RequestValidationError: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [12840, 374, 4876, 4193, 30]}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 0), 'msg': 'Input should be a valid string', 'input': 12840}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 1), 'msg': 'Input should be a valid string', 'input': 374}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 2), 'msg': 'Input should be a valid string', 'input': 4876}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 3), 'msg': 'Input should be a valid string', 'input': 4193}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 4), 'msg': 'Input should be a valid string', 'input': 30}]

Here is an image for better view: image

It seems that an embedding is being sent to the embedding model for some reason, which is not a valid input.

Here is the embeddings portion of the settings.yml just in case if that helps:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: http://localhost:8081/v1
    # api_version: v1
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

@jaiden-lee It looks like your embeddings settings are working for me. I'm currently using the nomic-embed-text model because I can't pull text-embedding-3-small in Ollama. Additionally, I need to set the OLLAMA_HOST environment variable like this:

export OLLAMA_HOST=http://192.168.165.140:11434

This is because my Ollama instance is hosted on another machine.

diff --git a/settings.yaml b/settings.yaml
index 92be0ad..5264752 100644
--- a/settings.yaml
+++ b/settings.yaml
@@ -8,7 +8,8 @@ llm:
   model_supports_json: true # recommended if this is available for your model.
   # max_tokens: 4000
   # request_timeout: 180.0
-  api_base: http://localhost:11434/v1
+  request_timeout: 600.0
+  api_base: http://10.10.129.30:11434/v1
   # api_version: 2024-02-15-preview
   # organization: <organization_id>
   # deployment_name: <azure_model_deployment_name>
@@ -22,6 +23,7 @@ llm:
 parallelization:
   stagger: 0.3
   # num_threads: 50 # the number of threads to use for parallel processing
+  num_threads: 1

 async_mode: threaded # or asyncio

@@ -31,28 +33,31 @@ embeddings:
   llm:
     api_key: ${GRAPHRAG_API_KEY}
     type: openai_embedding # or azure_openai_embedding
-    model: nomic_embed_text
-    api_base: http://localhost:11434/api
+    model: nomic-embed-text
+    api_base: http://10.10.129.30:11434/api
     # api_version: 2024-02-15-preview
     # organization: <organization_id>
     # deployment_name: <azure_model_deployment_name>
     # tokens_per_minute: 150_000 # set a leaky bucket throttle
     # requests_per_minute: 10_000 # set a leaky bucket throttle
     # max_retries: 10
+    max_retries: 1
     # max_retry_wait: 10.0
     # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
     # concurrent_requests: 25 # the number of parallel inflight requests that may be made
     # batch_size: 16 # the number of documents to send in a single request
     # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
+    concurrent_requests: 1 # the number of parallel inflight requests that may be made
+    batch_size: 1 # the number of documents to send in a single request
+    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
     # target: required # or optional
-  

 chunks:
   size: 300
   overlap: 100
   group_by_columns: [id] # by default, we don't allow chunks to cross documents
-    
+
 input:
   type: file # or blob
   file_type: text # or csv
jaiden-lee commented 1 month ago

I'm getting the same error. I'm running Mistral and nomic-embed-text locally via llama-cpp-python. One thing I noticed is that, on my embedding inference server, I'm getting this error, which I think has to do with the above error:

Exception: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [12840, 374, 4876, 4193, 30]}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 0), 'msg': 'Input should be a valid string', 'input': 12840}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 1), 'msg': 'Input should be a valid string', 'input': 374}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 2), 'msg': 'Input should be a valid string', 'input': 4876}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 3), 'msg': 'Input should be a valid string', 'input': 4193}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 4), 'msg': 'Input should be a valid string', 'input': 30}]
Traceback (most recent call last):
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 315, in app
    raise validation_error
fastapi.exceptions.RequestValidationError: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [12840, 374, 4876, 4193, 30]}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 0), 'msg': 'Input should be a valid string', 'input': 12840}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 1), 'msg': 'Input should be a valid string', 'input': 374}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 2), 'msg': 'Input should be a valid string', 'input': 4876}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 3), 'msg': 'Input should be a valid string', 'input': 4193}, {'type': 'string_type', 'loc': ('body', 'input', 'list[str]', 4), 'msg': 'Input should be a valid string', 'input': 30}]

Here is an image for better view: image It seems that an embedding is being sent to the embedding model for some reason, which is not a valid input. Here is the embeddings portion of the settings.yml just in case if that helps:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: http://localhost:8081/v1
    # api_version: v1
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

@jaiden-lee It looks like your embeddings settings are working for me. I'm currently using the nomic-embed-text model because I can't pull text-embedding-3-small in Ollama. Additionally, I need to set the OLLAMA_HOST environment variable like this:

export OLLAMA_HOST=http://192.168.165.140:11434

This is because my Ollama instance is hosted on another machine.

diff --git a/settings.yaml b/settings.yaml
index 92be0ad..5264752 100644
--- a/settings.yaml
+++ b/settings.yaml
@@ -8,7 +8,8 @@ llm:
   model_supports_json: true # recommended if this is available for your model.
   # max_tokens: 4000
   # request_timeout: 180.0
-  api_base: http://localhost:11434/v1
+  request_timeout: 600.0
+  api_base: http://10.10.129.30:11434/v1
   # api_version: 2024-02-15-preview
   # organization: <organization_id>
   # deployment_name: <azure_model_deployment_name>
@@ -22,6 +23,7 @@ llm:
 parallelization:
   stagger: 0.3
   # num_threads: 50 # the number of threads to use for parallel processing
+  num_threads: 1

 async_mode: threaded # or asyncio

@@ -31,28 +33,31 @@ embeddings:
   llm:
     api_key: ${GRAPHRAG_API_KEY}
     type: openai_embedding # or azure_openai_embedding
-    model: nomic_embed_text
-    api_base: http://localhost:11434/api
+    model: nomic-embed-text
+    api_base: http://10.10.129.30:11434/api
     # api_version: 2024-02-15-preview
     # organization: <organization_id>
     # deployment_name: <azure_model_deployment_name>
     # tokens_per_minute: 150_000 # set a leaky bucket throttle
     # requests_per_minute: 10_000 # set a leaky bucket throttle
     # max_retries: 10
+    max_retries: 1
     # max_retry_wait: 10.0
     # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
     # concurrent_requests: 25 # the number of parallel inflight requests that may be made
     # batch_size: 16 # the number of documents to send in a single request
     # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
+    concurrent_requests: 1 # the number of parallel inflight requests that may be made
+    batch_size: 1 # the number of documents to send in a single request
+    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
     # target: required # or optional
-  

 chunks:
   size: 300
   overlap: 100
   group_by_columns: [id] # by default, we don't allow chunks to cross documents
-    
+
 input:
   type: file # or blob
   file_type: text # or csv

Thanks, I was able to get global search working. However, local search seems to not be working. I'm getting a ZeroDivisionError.

Error embedding chunk {'OpenAIEmbedding': "'NoneType' object is not iterable"}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/__main__.py", line 76, in <module>
    run_local_search(
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/structured_search/local_search/search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
                            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/graphrag-local-ollama/graphrag/query/llm/oai/embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaiden/Documents/NeuroSymbolicAI/.venv/lib/python3.12/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

I tried looking at other issues for a solution, but I can't really get it working.