Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.36k stars 2.71k forks source link

Document Intelligence: support for multiple language at the same time during extraction #36173

Open galvangoh opened 1 week ago

galvangoh commented 1 week ago

Hello all, there are certain prebuilt models that are utilizes an OCR engines to retrieve text from the document. I am currently in the testing phase of my project using the prebuilt invoice model.

I am interested to know:

  1. If there are/will be multilingual support for those models?
  2. Does being explicit with the locale parameter (e.g. specifically tell the model that the document contents are in mandarin) of DocumentIntelligenceClient.begin_classify_document improve OCR performance?
  3. Before performing OCR on the documents submitted, are there any image preprocessing happening behind the scenes? So that OCR performance can be improved?

Thank you.

github-actions[bot] commented 1 week ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

github-actions[bot] commented 1 week ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @ctstone @vkurpad.

github-actions[bot] commented 6 days ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @bojunehsu @vkurpad.

bojunehsu commented 6 days ago
  1. Yes, many of our models support multiple locales/languages. Please see here for a list of locales supported by prebuilt invoice.
  2. Being explicit with locale may help in situations where the text is highly confusable and we are absolutely certain that the document only contains content in the specified locale. In the vast majority of cases, specifying locale will not make a difference.
  3. The OCR model is fairly robust. However, if there are specific image preprocessing in your scenario that would help human better understand the text, the model will likely benefit from those as well.
github-actions[bot] commented 5 days ago

Hi @galvangoh. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.