CDCgov / IDWA

Intelligent Data Workflow Automation
Apache License 2.0
1 stars 1 forks source link

Metrics for commercial OCR solutions #114

Closed bora-skylight closed 6 hours ago

bora-skylight commented 4 weeks ago

Acceptance Criteria

How long does an e2e form run take Measure the accuracy/precision/recall OCR Model: Measure the accuracy/precision/recall/time of the following

Scanned typed/handwritten text Scanned typed/handwritten numbers Scanned typed/handwritten phone numbers different formats 555-555-5555 (555) 456-4567 Scanned typed/handwritten addresses

Steps

  1. Create AWS Tesseract and Azure AI vision instance(API key) and run locally
  2. Ensure formatting of ground truth matches formatting of ocr result
  3. Add workflow that compares the results based on the above metrics