cityofaustin / atd-vz-data

The technology that powers the City of Austin's Vision Zero program
https://visionzero.austin.gov/viewer/
11 stars 2 forks source link

Populate VZ edits via SQL queries #1486

Closed johnclary closed 1 week ago

johnclary commented 1 month ago

Associated issues

This is a new take on getting the VZ edits populated. I've requested review despite this not really being totally complete. In addition to testing this code, I'm looking for feedback on the best way to execute this work when we launch. Python script? SQL script? something else?

Testing

URL to test: Local

Steps to test:

  1. Start your local stack with a recent copy of prod
  2. Run the CRIS import with a big ole chunk of CRIS data
  3. For people, units, and crashes, run the commands in each of their corresponding .sql files in order

I don't know how necessary it is at this stage to exhaustively check these results. You can browse the VZE and cross-check what you're seeing versus prod. I've also included debug.sql, which contains the queries I'm drafting to start exhaustively comparing the new data model against the old one.

I have been testing this by running these commands on our test stack on AWS, which has the full 10-year extract and isis even with the Socrata Export PR (#1485). We also have https://github.com/cityofaustin/atd-data-tech/issues/18326 as a separate issue dedicated to data quality testing.

Side note, no matter how many times you run these commands, you are going to end up with some residual records in the people_diffs table because we have duplicate records in old DB. For example, here are two dupe person records:

select * from atd_txdot_primaryperson where crash_id = 19340564 and unit_nbr = 2 and prsn_nbr = 1;

This is a known issue, and we're going to have to clean up these records in atd_txdot_primaryperson and atd_txdot_person. I believe there are less than 30 of these cases for serious and fatal injury crashes.


Ship list

johnclary commented 2 weeks ago

i'm going to leave this unmerged since i'm continuing to rework and add to the various debug/audit queries.