Azure-Samples / azure-ai-vision-sdk

SDK for Microsoft's Azure AI Vision
MIT License
76 stars 46 forks source link

OCR to MS Word Document #44

Closed amehmood-pls closed 9 months ago

amehmood-pls commented 11 months ago

Hello, I'm uncertain if this is the appropriate repository to pose this question. I've been engaged in experimentation using the Azure OCR API (https://eastus.api.cognitive.microsoft.com/computervision/imageanalysis:analyze), which provides JSON output. Additionally, I've come across another Azure Skill capable of exporting the hOCR file. This has led me to wonder whether there exists a readily available skill or API designed to transform OCR JSON into a Word document? Thank you!

luzhang06 commented 11 months ago

Hi @amehmood-pls, thanks for raising this question. May I know why you need to convert OCR JSON output to Word? Can you please share a bit more on your use case?

amehmood-pls commented 11 months ago

Hello @luzhang06, Thanks for asking. The purpose I have in mind involves replicating the layout, allowing the reviewer to both assess and modify the OCR outcomes. This emulates the interface within the human-in-the-loop (HITL) setup. The sole distinction is my intention to provide the reviewer with a Word document that precisely mirrors the original layout.

luzhang06 commented 9 months ago

Thanks @amehmood-pls! We don't have this capability available yet. For your scenario, since it's focused on document, you can consider using Azure AI Document Intelligence Layout model: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-layout?view=doc-intel-3.1.0. This will give you not only the texts, but also structures like paragraphs, tables, selection marks, titles, etc. Then you can write some code to output to docx based on the structure.