Closed johnclary closed 3 weeks ago
thanks again for the feedback. i am re-requesting review after having made the following changes:
--csv
or --pdf
. running $ cris_import.py
without any other args will throw an error 👍 created_by
and updated_by
columns were being removed and therefore not being set to cris
i'm going to leave the _column_metadata
improvements for a separate PR.
i notice when you restore from archive the extracts get copied over to the inbox but stay in the archived folder too. i guess that makes sense as expected behavior but just wanted to point it out incase yall think it makes more sense to move them?
i notice when you restore from archive the extracts get copied over to the inbox but stay in the archived folder too. i guess that makes sense as expected behavior but just wanted to point it out incase yall think it makes more sense to move them?
i did this partially out of laziness, but was also thinking it's probably nice to have that breadcrumb of seeing the extract in the ./archive
and knowing it was processed. happy to change this though 🤷
@roseeichelmann i am tracking a few follow-up todos for the cris_import_log
. will not forget them! 🙏
Associated issues
This is the new CRIS import, complete with CR3 pdf processing. This is ready for review, but please keep in mind these follow-up todos which I intend to address in follow-up issues (pending your feedback + approval):
cris_import_log
table, and probably rename that table to_cris_import_log
Testing
Setup
Start your local Vision Zero stack (database + Hasura + editor) using a recent copy of production
from the
./atd-vzd
directory, apply migrations and metadata:Grab a copy of the environment file from our 1pass dev vault. The item is named
Env file for the Vision Zero new data model CRIS import ETL
. Save it as.env
in the./atd-etl/data_model
directoryBuild the docker image—you only need to do this once
End-to-end CRIS import
This will download each extract available in the S3
./inbox
, unzip it, load the CSV crash records into the database, crop crash diagrams out of the CR3 PDFs, and upload the CR3 pdfs and crash diagrams to the s3 bucket.cr3_processed_at
andcr3_stored_fl
fieldsvision-zero-new-data-model-dev/dev/cr3s
. Observe that the crash diagrams and PDFs have a Last modified timestamps that track with when you ran the import script.cris_import_log
table. Verify that there are new entries for each extract you processedLocal import
This will process the extract zips that were downloaded to your
./extracts
directory during the previous step. CSVs will be loaded ino the db, and crash diagrams will be extracted but not uploaded to S3.Archive and un-archive zips
The script can archive the extract zips by moving them from
./inbox
to./archive
once they have been processed. This is intended for the production deployment, where the./inobx
functions as a work queue.Restore the zips to the
./inbox
using the helper scriptOther flags to test