Update OCR to Handle the new CR3 form for 2023

patrickm02L commented 2 years ago

A new CR3 form will be implement on 1/1/23 as mandated by TxDOT. The improvement adds new field as outlined in the Functional Requirements Specification document which include:

Intersecting Road Speed Limit
Direction of Traffic
Secondary Crash
Responder Struck
Date/Time Roadway Cleared
Date/Time Scene Cleared
Date Notified
Date Arrived
Autonomous Unit
Autonomous Level Engaged

Updated Code sheet v0.5

In order to manage this change, we will need to update the OCR to capture the data from the new fields. @frankhereford has outlined the following minor changes to bring operations back to normal operation:

Define a new constellation of 10 pixels which we expect to be 100% black / #000000. This constellation is used to determine if the PDF we’re working from is a scan or a digital asset from when it was created.
Define the new X,Y coordinates that define the extent of the diagram and the extent of the box around the narrative.
Plugging those ~20 coordinates into the correct arrays in the python script which is run by the ETL.

Additionally, to support detecting if the CR3 is an old-style or a new one we'll need to extend the constellation test to tell us if it's digital end-to-end and also which CR3 form style we're looking at.

patrickm02L commented 2 years ago

2023 CR-3 form v.0.10

2023 CR-3 form v0.10-1.jpg 2023 CR-3 form v0.10-2.jpg 2023 CR-3 form v0.10-3.jpg 2023 CR-3 form v0.10-4.jpg

patrickm02L commented 2 years ago

In Product Sync 7/20/22.

In late August, early September to help figure out how to scope out any changes.
Get a test extract of the data, a CR3 form so we can test
Budget time that first week of January to troubleshoot.

frankhereford commented 1 year ago

Here’s the OCR/image extraction: https://github.com/cityofaustin/atd-airflow/blob/master/dags/python_scripts/cr3_extract_diagram_ocr_narrative.py. It’s in the airflow repo, which is submoduled into the prefect repo, and called from here. https://github.com/cityofaustin/atd-prefect/blob/main/flows/vision-zero/cr3_ocr_narrative_extract_diagram/cr3_ocr_narrative_extract_diagram.py.

cityofaustin / atd-data-tech

Update OCR to Handle the new CR3 form for 2023 #9704

2023 CR-3 form v.0.10