Open preste-thsou opened 4 months ago
Thank you for your feedback. Tagging and routing to the team member best able to assist.
Hi @preste-thsou - Thanks for opening an issue! We'll take a look asap.
Hi @preste-thsou. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
Hi @preste-thsou, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hello, Thanks for your reply. To answer your question, analyze_mode is defined with the following code,
if self._page_class == 'receipt':
analyze_mode = 'prebuilt-receipt'
else:
analyze_mode = 'prebuilt-invoice'
The value is chosen based on the result of a classification task which is performed independantly. In the error case, the result of the classification is "receipt", I don't remember if I tested the case with invoices already.
The page object is a single page in-memory pdf document, created using BytesIO and PyPDF2.PDFwriter() => the code is a bit complex because it adapts to a variety of incoming formats and converts to pdf in case the initial input was an image, but it always ends with :
tmp = BytesIO()
pdf_page.write(tmp)
tmp.seek(0)
where the pdf_page contains a valid PyPDF2.PDFwriter() object. This tmp is then passed on, via a class argument _page, and then to the azure API call via a page = copy.deepcopy(self._page)
This code works without any issue with formrecognizer.
Describe the bug I'm trying to migrate from formrecognizer to DocumentIntelligence. I could successfuly update my environment with the new package, but I get a 400 error (invalid argument) when running
begin_analyze_document
withprebuilt-receipt
model. The same code and same docs with previous library was working ok. I'm using a new resource created in West Europe region to be able to use the model v4.0, and I udpated API endpoint and key accordingly.I tried with an endpoint url with or without trailing '/', since I noticed that the post url contains two '//', but it gives the same result in both cases. With a trailing '/' I see in the logs the following POST :
https://REDACTED_DOMAIN.cognitiveservices.azure.com//documentintelligence/documentModels/prebuilt-receipt:analyze?api-version=REDACTED&locale=REDACTED'
To Reproduce Steps to reproduce the behavior:
pip install azure-ai-documentintelligence-1.0.0b2
( or b3)Expected behavior Azure API to process my document, as it was withform-recognizer
Screenshots Request method: 'POST' Request headers: 'content-type': 'application/json' 'x-ms-client-request-id': '82483726-4a65-11ef-ad80-0dd1140f25ff' 'User-Agent': 'azsdk-python-ai-documentintelligence/1.0.0b3 Python/3.10.12 (Linux-6.5.0-44-generic-x86_64-with-glibc2.35)' 'Ocp-Apim-Subscription-Key': 'REDACTED' A body is sent with the request 07/25/2024 11:08:51 AM Response status: 400 Response headers: 'Content-Length': '172' 'Content-Type': 'application/json; charset=utf-8' 'ms-azure-ai-errorcode': 'REDACTED' 'x-ms-error-code': 'InvalidArgument' 'apim-request-id': 'REDACTED' 'Strict-Transport-Security': 'REDACTED' 'x-content-type-options': 'REDACTED' 'x-ms-region': 'REDACTED' 'Date': 'Thu, 25 Jul 2024 09:08:51 GMT'
Additional context Add any other context about the problem here.