CDCgov / IDWA

Intelligent Data Workflow Automation
Apache License 2.0
1 stars 1 forks source link

Generate disease form dataset #4

Closed zdeveloper closed 4 months ago

zdeveloper commented 6 months ago

The goal of this ticket is to have a better disease form dataset for testing and evaluating our solution

Acceptance Criteria Please utilize the PDF's already shared with us in the drive, for Pertussis, Hepatitis (A/B/Viral), and Mumps

Each form should have the following versions

Additional context Texas case investigation forms

Texas Notifiable Conditions https://www.dshs.texas.gov/sites/default/files/IDCU/investigation/Reporting-forms/Notifiable-Conditions-2024Color.pdf

derekadombek commented 5 months ago

@zdeveloper Was the initial plan for this to manually fill out each of these forms with dummy data, using the different formatting of "typing" and "handwritten" and then save them in the OCR dir for the code/tests to point to? If so how many of each were you thinking? Are we using these datasets for model training or unit testing purposes?

zdeveloper commented 5 months ago

@zdeveloper Was the initial plan for this to manually fill out each of these forms with dummy data, using the different formatting of "typing" and "handwritten" and then save them in the OCR dir for the code/tests to point to? If so how many of each were you thinking? Are we using these datasets for model training or unit testing purposes?

The idea was to have 5 forms

  1. Pertussis
  2. Hepatitis (A)
  3. Hepatitis (B)
  4. Hepatitis (Viral)
  5. Mumps

and for each form, we have the following versions

  1. Empty form
  2. Scanned Handwritten form
  3. Scanned Computer filled
  4. Empty fillable form
  5. Filled out fillable form

I would say the use would be for both unit testing and model training (providing examples for the AI to learn from)

derekadombek commented 4 months ago

complete_fillable: https://drive.google.com/drive/folders/1cZHn5JM0V9RnVnIz2qZz_e8Nq9FDmJOe scanned_complete_fillable: https://drive.google.com/drive/folders/15KyB8Lxz7Vun_3J2zoRuLDD4dLBX-vuc scanned_complete_handwritten: https://drive.google.com/drive/folders/1Ss_vpv3f3NVv7Ja0jq-J5l4yGv8_gATh

Blanks: https://drive.google.com/drive/folders/118Npqkf5mW2sm8zd3ohQ9a7lDe7JDAIo & https://drive.google.com/drive/folders/1tEbH_xfCAPPf-mqk-YqC1hv9DXeU59FV