You will need a copy of the database that includes the change_log_crashes_cris table data. To include just this one change log table in your replica, comment out change_log_crashes_cris right here in the helper script.
And now replicate like normal. This will download ~4gb of data:
./vision-zero replicate-db
Apply migrations and metadata
hasura migrate apply
hasura metadata apply
Now you will use the change log to backfill narratives that have been erased in recent weeks. Locate the commented out update statement at the bottom of the 1727451510064_preserve_crash_narrative up migration (here). Manually execute this update command in your sql client.
At this point there should be ~800 remaining crashes that are missing a narrative and need to be OCR'd. You can check this by querying the new SQL view:
select count(*) from view_crash_narratives_ocr_todo;
It's time to test the OCR script. head to the ./etl/cris_import directory and rebuild the ETL docker image:
docker compose build
We need to set our environment to use the prod S3 bucket, because it contains all of the CR3 PDFs that we will process. The best way to do this is to save a copy of your existing .env file as prod.local.env, and set the BUCKET_ENV value to prod in this new file (this PR renames the ENV var to BUCKET_ENV). Make sure your Hasura endpoint is set to you local host.
Now it really is time to run the OCR script. Use this run command to run the OCR process with your new env file:
Nice! You can inspect the new narratives by querying the crashes_edits table:
select
id,
updated_at,
updated_by,
investigator_narrative
from
crashes_edits
where
investigator_narrative is not null;
Now we will test the CRIS import and make sure that our narrative-preserving trigger is working as expected. First, let's inspect a batch of crashes the should have investigator narratives, observing that the narrative is populated for all records:
I have placed a CRIS extract in the dev S3 inbox for testing (extract_2023_20240823135638635_99726_20240920_HAYSTRAVISWILLIAMSON), and this extract is missing a large number of crash narratives, including for those crashes listed above. Let's make sure those narratives are not going to be overwritten, thanks to the new DB trigger.
Make sure your BUCKET_ENV is set to dev in your .env file, and run the CRIS import like so:
Associated issues
Testing
change_log_crashes_cris
table data. To include just this one change log table in your replica, comment outchange_log_crashes_cris
right here in the helper script.Now you will use the change log to backfill narratives that have been erased in recent weeks. Locate the commented out
update
statement at the bottom of the1727451510064_preserve_crash_narrative
up migration (here). Manually execute this update command in your sql client.At this point there should be ~800 remaining crashes that are missing a narrative and need to be OCR'd. You can check this by querying the new SQL view:
./etl/cris_import
directory and rebuild the ETL docker image:We need to set our environment to use the prod S3 bucket, because it contains all of the CR3 PDFs that we will process. The best way to do this is to save a copy of your existing
.env
file asprod.local.env
, and set theBUCKET_ENV
value toprod
in this new file (this PR renames theENV
var toBUCKET_ENV
). Make sure your Hasura endpoint is set to you local host.Now it really is time to run the OCR script. Use this run command to run the OCR process with your new env file:
crashes_edits
table:extract_2023_20240823135638635_99726_20240920_HAYSTRAVISWILLIAMSON
), and this extract is missing a large number of crash narratives, including for those crashes listed above. Let's make sure those narratives are not going to be overwritten, thanks to the new DB trigger.Make sure your
BUCKET_ENV
is set todev
in your.env
file, and run the CRIS import like so:That's it—thank you for testing this PR 👍
Ship list
main
branch