Open c0derm4n opened 1 week ago
How much data did you upload before starting an indexing job? Did you use the wikipedia sample data download script to get a couple of files or provide your own data?
I have uploaded 700 papers in txt format (parsed from PDF), totaling 17M @jgbradley1
How much data did you upload before starting an indexing job? Did you use the wikipedia sample data download script to get a couple of files or provide your own data?
The maximum file size is 100KB
I have uploaded 700 papers in txt format (parsed from PDF), totaling 17M @jgbradley1
How much data did you upload before starting an indexing job? Did you use the wikipedia sample data download script to get a couple of files or provide your own data?
Interestingly enough, my job has also been stuck for about 2h now. It's a single txt file or about 3kB. I get this over and over:
{
'status_code': 200,
'index_name': 'findata',
'storage_name': 'findata',
'status': 'running',
'percent_complete': 12.5,
'progress': '2 out of 16 workflows completed successfully.',
}
Is there any way to debug this in the Azure Console?
In the solution accelerator, a common reason why jobs can appear to run a long time comes down to the TPM/RPM quota of the Azure OpenAI instance you're using and the retry logic that is configured.
I would first like to direct your attention to this config file in the accelerator. This config file is very similar to the config file used by the graphrag library. A complete description of the config fields is documented here. In the solution accelerator, we set some config fields dynamically and some are hardcoded. The TPM/RPM is one of the hardcoded values.
If you have deployed the accelerator with an AOAI model deployment with a TPM/RPM threshold much lower than what the hardcoded values are set to, then that could be causing the graphrag
library to hit the rate limit thresholds of your AOAI model very quickly. The accelerator has two levels of retry logic implemented
graphrag
library itself, rate limiting and retry logic is configured with some default values.One way to debug the indexing job while it's running is to use kubectl
. Assuming you ran the deploy.sh
script following the deployment guide, then kubectl
will already be logged in to the AKS instance (see this guide if not).
Now you can run kubectl get pods
which will print out at least three pod names that all start with "graphrag-*". If you have an active running indexing job, you should also see another k8s pod with the name indexing-job-<hash>
. You can run kubectl logs indexing-job-<hash>
and will see the entire log output from the indexing job which may provide more detailed information that would explain the errors you are experiencing.
This accelerator is meant to be a reference architecture which means you may need to modify it slightly to fit the needs of your environment and intended usage. I would suggest testing your deployment on just a few files at first (i.e. not 700 files) to get a feel for how long indexing will take given your TPM/RPM quota. Do be aware that step 2 of the graphrag
indexing pipeline is entity extraction - this is the most time-intensive step of the entire indexing pipeline. 80-90% of the total time to index data is contained within this step. If multiple indexing jobs are kicked off by an API user and the appropriate amount of TPM/RPM quota has not been allocated, you can easily overload your AOAI model rate limits by too many simultaneous indexing jobs (implementing some sort of indexing job management queue is on the list of things we'd like to tackle which may help solve this problem).
Note: once you start modifying the graphrag pipeline settings to fit your needs, you are able to rerun the deploy.sh
script on the same resource group as your original deployment. The deployment script was written in a way that rerunning it on a previous resource group deployment of graphrag should not cause a problem. It will only update changes that you make (and take around 5 minutes to run).
Thanks for the very useful guide. It seems that I may have misconfigured the deployment maybe, since I see this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
yield
File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 167, in handle_async_request
raise UnsupportedProtocol(
httpcore.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1537, in _request
response = await self._client.send(
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1661, in send
response = await self._send_handling_auth(
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
response = await self._send_handling_redirects(
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
response = await self._send_single_request(request)
File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1763, in _send_single_request
response = await transport.handle_async_request(request)
File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 372, in handle_async_request
with map_httpcore_exceptions():
File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/claims/claim_extractor.py", line 121, in __call__
claims = await self._process_document(prompt_args, text, doc_index)
File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/claims/claim_extractor.py", line 165, in _process_document
response = await self._llm(
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in __call__
result = await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in __call__
return await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in __call__
output = await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/caching_llm.py", line 104, in __call__
result = await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
result = await action(retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 418, in exc_check
raise retry_exc.reraise()
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 185, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 49, in __call__
return await self._invoke(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 53, in _invoke
output = await self._execute_llm(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 55, in _execute_llm
completion = await self.client.chat.completions.create(
File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1289, in create
return await self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1805, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1503, in request
return await self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1571, in _request
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.
[ERROR] 2024-07-07 18:33:30,195 - Claim Extraction Error
[ERROR] 2024-07-07 18:33:30,931 - Error Invoking LLM
[ERROR] 2024-07-07 18:33:31,379 - Error Invoking LLM
[ERROR] 2024-07-07 18:33:32,159 - Error Invoking LLM
It seems like the problem is this:
httpcore.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.
But I do not understand where this is coming from, since my deploy.parameters.json
look like this:
{
"GRAPHRAG_API_BASE": "graphragdaneast2",
"GRAPHRAG_API_VERSION": "2024-02-15-preview",
"GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME": "embed",
"GRAPHRAG_EMBEDDING_MODEL": "text-embedding-ada-002",
"GRAPHRAG_LLM_DEPLOYMENT_NAME": "gpt4",
"GRAPHRAG_LLM_MODEL": "gpt-4",
"LOCATION": "eastus2",
"RESOURCE_GROUP": "GraphRagTest"
}
What am I missing?
I just realised that my problem was putting the wrong format for GRAPHRAG_API_BASE
, so I changed that to the actual Azure OpenAI Endpoint URL. That was a bit dumb. Testing again.
I just realised that my problem was putting the wrong format for
GRAPHRAG_API_BASE
, so I changed that to the actual Azure OpenAI Endpoint URL. That was a bit dumb. Testing again.
Have you solved your problem?
The API endpoint is expected to be provided with the following format:
GRAPHRAG_API_BASE=https://<myname>.openai.azure.com>
In the documentation, I think we can provide more clarification/examples for each of the deployment variables so that it is clearer in the future.
this is the log after i run kubectl logs indexing-job-e8eca148d5bc2b0004e7dc49db249490-qgl5q:
openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 5 seconds.'}}
[ERROR] 2024-07-08 01:46:56,830 - Claim Extraction Error
WARNING:graphrag.llm.base.rate_limiting_llm:Process failed to invoke LLM 1/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
[ERROR] 2024-07-08 01:46:57,288 - Error Invoking LLM
WARNING:graphrag.llm.base.rate_limiting_llm:Process failed to invoke LLM 3/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
[ERROR] 2024-07-08 01:46:57,300 - Error Invoking LLM
WARNING:graphrag.llm.base.rate_limiting_llm:Process failed to invoke LLM 10/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
[ERROR] 2024-07-08 01:46:57,336 - Error Invoking LLM
ERROR:graphrag.index.graph.extractors.claims.claim_extractor:error extracting claim
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/claims/claim_extractor.py", line 121, in __call__
claims = await self._process_document(prompt_args, text, doc_index)
File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/claims/claim_extractor.py", line 165, in _process_document
response = await self._llm(
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in __call__
result = await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in __call__
return await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in __call__
output = await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/caching_llm.py", line 104, in __call__
result = await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
result = await action(retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 418, in exc_check
raise retry_exc.reraise()
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 185, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 151, in do_attempt
await sleep_for(sleep_time)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 49, in __call__
return await self._invoke(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 53, in _invoke
output = await self._execute_llm(input, **kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 55, in _execute_llm
completion = await self.client.chat.completions.create(
File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1289, in create
return await self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1805, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1503, in request
return await self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1599, in _request
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 5 seconds.'}}
[ERROR] 2024-07-08 01:46:57,347 - Claim Extraction Error
WARNING:graphrag.llm.base.rate_limiting_llm:Process failed to invoke LLM 1/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
[ERROR] 2024-07-08 01:46:57,829 - Error Invoking LLM
WARNING:graphrag.llm.base.rate_limiting_llm:Process failed to invoke LLM 2/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
[ERROR] 2024-07-08 01:46:58,662 - Error Invoking LLM
and this is my deploy para:
{
"GRAPHRAG_API_BASE": "https://ada002-eus2.openai.azure.com/",
"GRAPHRAG_API_VERSION": "2023-12-01-preview",
"GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME": "ada002-eus2",
"GRAPHRAG_EMBEDDING_MODEL": "text-embedding-ada-002",
"GRAPHRAG_LLM_DEPLOYMENT_NAME": "tcl-ai-France",
"GRAPHRAG_LLM_MODEL": "gpt-4",
"LOCATION": "East US 2",
"RESOURCE_GROUP": "graph_rag2"
}
May I ask if the parameter GRAPHRAG-API-BASE is set correctly? I don't quite understand the meaning of GRAPHRAG-API, so I used the endpoint of my embedded model
hi @c0derm4n - yes, that's the correct syntax for GRAPHRAG_API_BASE (See Josh's reply above). The error you shared seem to indicate that you are reaching a rate limit. You can modify the rate used by the accelerator by modifying this file.
On Azure, you can increase the quota for each model by following the instructions here: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest
I advise you first make modifications on Azure, and keep the local YAML file intact. Only if you confirm you have access to more TPMs, should you increase the YAML file. Otherwise, when the Azure TPM are less than the value in your config file, you will hit a rate limit error (an HTTP Error 429).
Hello,
I am having the same issue of the index job getting stucked. I am using the wikipedia articles provided.
Here are my deployment parameters:
{
"GRAPHRAG_API_BASE": "https://checklistcreation.openai.azure.com/",
"GRAPHRAG_API_VERSION": "2024-05-13",
"GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME": "Embedding",
"GRAPHRAG_EMBEDDING_MODEL": "text-embedding-ada-002",
"GRAPHRAG_LLM_DEPLOYMENT_NAME": "gtp4-0",
"GRAPHRAG_LLM_MODEL": "gpt-4o",
"LOCATION": "UK South",
"RESOURCE_GROUP": "Checklist_creation"
}
Here is the log:
[ERROR] 2024-07-11 08:25:49,090 - Entity Extraction Error
[ERROR] 2024-07-11 08:25:49,538 - Error Invoking LLM
ERROR:root:error extracting graph
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/graph/graph_extractor.py", line 118, in call
result = await self._process_document(text, prompt_variables)
File "/usr/local/lib/python3.10/site-packages/graphrag/index/graph/extractors/graph/graph_extractor.py", line 146, in _process_document
response = await self._llm(
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/json_parsing_llm.py", line 34, in call
result = await self._delegate(input, kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_token_replacing_llm.py", line 37, in call
return await self._delegate(input, kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/openai/openai_history_tracking_llm.py", line 33, in call
output = await self._delegate(input, kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/caching_llm.py", line 104, in call
result = await self._delegate(input, kwargs)
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 177, in call
result, start = await execute_with_retry()
File "/usr/local/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 166, in anext
do = await self.iter(retry_state=self._retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/init.py", line 398, in
It seems it can't find the model but the deployment parameters should be correct. Any ideas on how to solve the problem?
hi!
is "gtp" right?
"GRAPHRAG_LLM_DEPLOYMENT_NAME": "gtp4-0",
Describe the bug Running Quickstart.ipynb - Start new indexing job. When I checked the progress yesterday, it was 6.25%. It has been 24 hours now, and it is still 6.25%!
{ 'status_code': 200, 'index_name': 'graph_rag_index_0705_v3', 'storage_name': 'graph_rag_storage_0705_v3', 'status': 'running', 'percent_complete': 6.25, 'progress': "Workflow 'create_base_extracted_entities' started.", }
Expected behavior Indexing should be 100% complete
Screenshots
Desktop (please complete the following information):