CDCgov / IDWA

Intelligent Data Workflow Automation
Apache License 2.0
1 stars 1 forks source link

Implement the findings from the benchmarking spike into the OCR pipeline #103

Open zdeveloper opened 2 months ago

zdeveloper commented 2 months ago

Implement the findings from the benchmarking Spike into an automated benchmark

Acceptance Criteria

measure E2E scanned metrics for two identical forms (one digitally filled, one hand-filled)

  1. How long does an e2e form run take
  2. Measure the accuracy/precision/recall

OCR Model: Measure the accuracy/precision/recall/time of the following

  1. Scanned typed/handwritten text
  2. Scanned typed/handwritten numbers
  3. Scanned typed/handwritten phone numbers different formats
    1. 555-555-5555
    2. (555) 456-4567
  4. Scanned typed/handwritten addresses

Additional context please rewrite the tests to where it would be easier to include more cases in the future This story is just for the initial computation of the metrics, doesnt include the persistence of it, that will be in a follow up ticket

bora-skylight commented 3 weeks ago

@zdeveloper could you give a brief overview of where this stands so @arinkulshi could potentially move it forward?