cityofaustin / atd-vz-data

The technology that powers the City of Austin's Vision Zero program
https://visionzero.austin.gov/viewer/
11 stars 2 forks source link

Refactor CRIS import script to use S3 and log activities #1445

Closed frankhereford closed 3 months ago

frankhereford commented 3 months ago

Associated issues

This PR aims to advance the following issue: https://github.com/cityofaustin/atd-data-tech/issues/16953.

It completes the second bullet point, and on merger, allows the final two points to become possible.

Mechanics

There is a S3 bucket named vision-zero-cris-exports. It has versioning turned on, so we can always recover a lost or changed file, if we were to need to -- at least for 180 days. After 180 days, things start to get removed due to lifecycle rules in place.

In the bucket, you'll find 3 folders, one for development, staging and production. Within each of those, you'll see an inbox, where CRIS will upload extracts, and a processed folder, where we will keep our, well, processed extracts.

Additionally, you'll see a new table in the DB called cris_import_log, which will keep some logs of the import events, one record per processing attempt per file.

Testing

This command has been real useful in development, and I imagine it may be useful in testing. It assumes you have the aws command set up with your credentials.

aws s3 mv s3://vision-zero-cris-exports/development/processed/extract_2023_20240417155400400_99726_20240401-20240405_HAYSTRAVISWILLIAMSON.zip s3://vision-zero-cris-exports/development/inbox/                                                                                

Local testing for this one. To get going, spin up your local stack to where you have at least a DB and a graphql-engine endpoint running. ⚠️ Be sure to apply the new migration that comes along with this PR.

cd atd-etl/cris_import;
docker compose build --no-cache;

There's a file in the processed folder for development, and you can use the above to move it into the inbox or via the S3 console works just fine too.

docker compose run cris-import;

Ship list

frankhereford commented 3 months ago

@johnclary, thank you /so much/ for all of your thoughtful feedback; I appreciate your ability to step back and see the big picture.

No rush on any sort of review here. I am going to baby any CRIS extracts we get (or have to get via downloading) though for the time being until we're ready to roll this out. It's really no trouble at all, esp after working on this -- for a hot second, I have all of the moving parts under control.