CDCgov / ReportVision

Intelligent Data Workflow Automation
Apache License 2.0
1 stars 1 forks source link

[Benchmarking Framework] Create a standardized form testing dataset #254

Open zdeveloper opened 6 days ago

zdeveloper commented 6 days ago

Create a standardized form testing dataset based on a readily available dataset, this will be used for building a standard benchmark for ReportVision

Acceptance Criteria

Additional context Example of datasets came be coming from NIST or huggingface Please be very communicative and the AC can be adjusted

zdeveloper commented 5 days ago

potential datasets: medical reports: https://huggingface.co/datasets/AnubhutiBhardwaj/medical-reports-demo medical prescriptions: https://huggingface.co/datasets/Technoculture/medical-prescriptions fake tax forms: https://huggingface.co/datasets/singhsays/fake-w2-us-tax-form-dataset

schreiaj commented 2 days ago

Per discussion - the goal of this is to build a dataset to test the OCR not alignment.