TungstenTransformation / MicrosoftOCR

Integration of Microsoft Computer Vuision v4.0 as an OCR Engine into Kofax Transformation
6 stars 2 forks source link

Microsoft Azure Document Intelligence for Tungsten TotalAgility

NOTE The latest versions 1.0.7 onwards only support Document Intelligence 4.0 preview.

This documentation is for Microsoft Azure Document Intelligence.
Click here for documentation for Microsoft Computer Vision (for photos and images).
This repository supports different Microsoft OCR engines

NOTE The correct model name for US W-2 tax forms is prebuilt-tax.us.w2 and not prebuilt-tax.us.W-2. The microsoft website is incorrect.

Install Microsoft OCR and DI on premise.

Downloads

Details

Configure Microsoft Azure

Configure Tungsten Transformation

How to use Microsoft OCR with Advanced Zone Locator

How it works

In Tungsten Transformation and TotalAgility runtime. Tungsten Transformation performs OCR on demand, either when Text Classification is required or when a locator needs text. This script runs in the event Document_BeforeClassify, which occurs before Transformation ever tries to OCR the document. The script checks if you named a profile "Microsoft OCR". If so, it sends each page of the document to Microsoft and copies the words and coordinates into the XDocument. The XDocument now has an OCR layer called "Microsoft OCR", which will be used by the classifiers and locators - OCR won't be called again with another document. In Project Builder or Transformation Designer, pressing F4 performs OCR with the built-in engines. To force it to use Microsoft OCR, press F5 (Classify) to send the document to Microsoft, or select the correct class in the class-true and press F6 (Extract) if you have an Excraction Group only project.

How to use Microsoft Document Intelligence with Tables

Open an issue if you find a bug or need a feature implemented.

useful links