Open davidwboyd opened 1 year ago
I also don't see anything in the docs to extend the timeout: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/README.md
You could log an issue in the azure-sdk-for-python repo about this to see if they have any feedback. However, it may just be a limitation of the underlying API. So a workaround would be to preprocess the PDF to split it into smaller documents.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.
I am planning to build an app with Azure Document Intelligence and while testing the capabilities of this service, I also found this issue when trying to convert a large file. Looks like this is not a priority, perhaps I can split the PDF prior to sending it,,,
Is there any update on this? I am getting the following error when trying to analyze a pdf of 5MB:
"azure.core.exceptions.HttpResponseError: (Timeout) The operation was timeout. Code: Timeout Message: The operation was timeout."
I'd rather not have to split the document into smaller chunks beforehand. Any ideas / solutions?
I'm encountering the same error with the REST API.
{ "error": { "code": "Timeout", "message": "The operation was timeout." } }
+1.
The only solution seems to be adding more document intelligence services and splitting up the doc into smaller chunks, which isn't a great solution. Would love a timeout or parallelism functionality.
Hi all, thanks for the feedback. I've created an issue in our Azure SDK repo and we'll investigate ASAP.
Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com
Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com
@pamelafox Please check your inbox as I have sent you a sample file to reproduce this issue. Furthermore, this issue occurs when using the Markdown output format.
Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com
Hi @pamelafox , just want to check if you have received any file? I've tested with a 426 pages PDF in 16936kb, but didn't reproduce the issue.
i'll share one tomorrow. @pamelafox
@mikedizon can you also share it to yall@microsoft.com
?
@YalinLi0312 I've received a few files, but had intermittent success reproducing. If you're able to reproduce as well, that'd be great.
@YalinLi0312 @pamelafox curious to hear if you encountered the same issues I had with that file.
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Attempt to process a document of 390 or more pages
Any log messages given by the failure
Extracting text from 'C:\Users\dboyd\Documents\DesignSpecs/data\PA - Sch 23 - Extracts from Proposal.pdf' using Azure Form Recognizer Traceback (most recent call last): File "C:\Users\dboyd\Documents\DesignSpecs\scripts\prepdocs.py", line 379, in
page_map = get_document_text(filename)
File "C:\Users\dboyd\Documents\DesignSpecs\scripts\prepdocs.py", line 111, in get_document_text
poller = form_recognizer_client.begin_analyze_document("prebuilt-layout", document = f)
File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\core\tracing\decorator.py", line 76, in wrapper_use_tracer
return func(args, kwargs)
File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_document_analysis_client.py", line 126, in begin_analyze_document
return self._client.begin_analyze_document( # type: ignore
File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_generated_operations_mixin.py", line 170, in begin_analyze_document
return mixin_instance.begin_analyze_document(model_id, pages, locale, string_index_type, analyze_request, kwargs)
File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\core\tracing\decorator.py", line 76, in wrapper_use_tracer
return func(args, **kwargs)
File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_generated\v2022_08_31\operations_form_recognizer_client_operations.py", line 576, in begin_analyze_document
raw_result = self._analyze_document_initial( # type: ignore
File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_generated\v2022_08_31\operations_form_recognizer_client_operations.py", line 508, in _analyze_document_initial
raise HttpResponseError(response=response)
azure.core.exceptions.HttpResponseError: (Timeout) The operation was timeout.
Code: Timeout
Message: The operation was timeout.
Expected/desired behavior
Need to be able to set a longer timeout for large files in the being_analyze_document call.
OS and Version?
Mention any other details that might be useful
THe below is the code that is timing out: with open(filename, "rb") as f: poller = form_recognizer_client.begin_analyze_document("prebuilt-layout", document = f)
Given that the entire bytestream of the large file has to be sent to the endpoint this looks like a straight HTTP timeout. However, there is no place in the API documentation to change the timeout for the begin_analyze_document call.
I do not believe that re-writing the example to use async IO will work as this is an endpoint timeout.