TheAiSingularity / graphrag-local-ollama

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction
MIT License
662 stars 95 forks source link

switch to another embedding model, but it's not taking effect!/ model "nomic-embed-text" not found, try pulling it first #31

Closed c0derm4n closed 1 month ago

c0derm4n commented 2 months ago

The configuration has been modified in the yaml file to switch to another embedding model, but it's not taking effect!

17:31:12,217 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: model "nomic-embed-text" not found, try pulling it first

在yaml文件中修改了配置切换为其他的嵌入模型,但是不生效!

embeddings: async_mode: threaded # or asyncio llm: api_key: ${GRAPHRAG_API_KEY} type: openai_embedding # or azure_openai_embedding model: chatfire/bge-m3:q8_0 api_base: http://localhost:11434/api

api_version: 2024-02-15-preview

# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
xuhe2 commented 2 months ago

my setting.yaml is

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: chatfire/bge-m3:q8_0
    api_base: http://localhost:11434/api

it can run without the error about model "nomic-embed-text" not found, try pulling it first 我的配置文件可以运行并且没有未发现模型的报错 u can check whether the setting.yaml in the ragtest dir can be changed 首先, 检查你在ragtest文件夹下的配置文件是否修改 and whether the cache has been deleted 你可以尝试清除缓存文件夹重新构建你的项目

finally, my project has a error

{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n  File \"/home/xuhe/task/2024outsource10/graphrag-local-ollama/graphrag/index/run.py\", line 323, in run_pipeline\n    result = await workflow.run(context, callbacks)\n  File \"/home/xuhe/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 369, in run\n    timing = await self._execute_verb(node, context, callbacks)\n  File \"/home/xuhe/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/home/xuhe/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/datashaper/engine/verbs/window.py\", line 73, in window\n    window = __window_function_map[window_operation](input_table[column])\n  File \"/home/xuhe/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/pandas/core/frame.py\", line 4102, in __getitem__\n    indexer = self.columns.get_loc(key)\n  File \"/home/xuhe/miniconda3/envs/graphrag-ollama-local/lib/python3.10/site-packages/pandas/core/indexes/range.py\", line 417, in get_loc\n    raise KeyError(key)\nKeyError: 'community'\n", "source": "'community'", "details": null}

it looks like a error for language model. I don't know how to solve this problem

xuhe2 commented 2 months ago

22 the pr is about the embedding model setting, it may be helpful

morler commented 2 months ago

The issue is on line 31 of the file "graphrag\llm\openai\openai_embeddings_llm.py", there is an uncleaned "nomic-embed-text". Please change it to the name of the model you are currently using.

TheAiSingularity commented 1 month ago

Fixed this, now you can specify the model from settings.yaml.