Closed johnclary closed 3 days ago
👋 Thanks for reviewing this PR. Heads-up that I will be working on down migrations right away, since this wasn't necessary up until just now, when I moved everything into the public
schema.
For time being, please focus your review attention on the conventions, assumptions, and functionality being put forward by this branch that may be at odds with our vision for the data model, or will be difficult to undo down the road. That said, no detail is too small :)
also—sorry for more noise—I do not think there is any need to review the files in ./atd-toolbox
at this time. the atd-vzd
and atd-vze
changes are what matters :)
This is suchhhh amazing work!!! I still have some more to dig into but here are some notes I have so far:
peh
in the cris_import
script and dropping the names if not a fatality. What about for updates, where inj_sev is updated to a fatality, then do we want to carry over the name? Or is that getting too complicated?crashes_list
view by summing up the counts of injury severities from the person tables instead of the vz team having to keep everything up to date in multiple places. Now they only have to QA in one place, under the injury severity column of the persons table.vz_
if they are custom so we can quickly know if something is not from cris? That would help sooo much imo and i know we talked about doing something like that thanks @roseeichelmann for your review!
- I like your handling of
peh
in thecris_import
script and dropping the names if not a fatality. What about for updates, where inj_sev is updated to a fatality, then do we want to carry over the name? Or is that getting too complicated?
My thinking was that if a VZ user edits a person injury severity to a fatality, they will need to input the person's name by manually copying it from the CR3 pdf. If the injury severity change comes via CRIS import, then the name will be picked up in the ETL and correctly added to the person record ✅
- So great that you cleaned up the injury counts and now they are handled automatically in the
crashes_list
view by summing up the counts of injury severities from the person tables instead of the vz team having to keep everything up to date in multiple places. Now they only have to QA in one place, under the injury severity column of the persons table.
Yeah! There are still some open questions about the best way to approach some of the metrics. And for that we have this WIP PR.
- What about prefixing column names with
vz_
if they are custom so we can quickly know if something is not from cris? That would help sooo much imo and i know we talked about doing something like that
Yes, I haven't forgotten about it this idea. I think I'm a little on the fence about because I'd like a more foolproof way for us to maintain our list of CRIS vs VZ columns. Maybe i'll open a little side branch just so we can see what this looks like.
The alternative i think still prefer is having a column_metadata
table in the db where we track this info, plus other CRIS import and VZE-related metadata (e.g. we could track which columns are in the public vs private CRIS extract). This table would basically replace the function of the current google sheet 🤔
@johnclary I definitely think a column_metadata
table is something we should have, but when im just scrolling through the database or a table i think i would find it helpful to see a column and know that its custom based off the name; im defs a fan of descriptive column names. curious what everyone else thinks! yeah maybe a spin off branch to test out how it feels could be good
Something else ive run into that i wanna mention - i still feel like the crash_id
should be carried into the edits tables. i was looking for a unit by the id in the units_edits
table and i found it unhelpful that the crash_id
column was null because i was trying to validate that i was editing the right unit in this specific crash.. idk maybe im still thinking about all this wrong and in a crash id-centric way?
Thanks y'all, here we go....
This PR holds the main collector branch for the new data model.
Get it running
./atd-vzd
directory, apply migrations and metadata:./atd-etl/cris_import
and run the CRIS import following the instruction in the ETL readme. There is an.env
file available in 1password under Env file for the Vision Zero new data model CRIS import ETLThis command will process the extracts available in the
./dev
directory in S3:To populate the VZ edits, navigate to
./atd-toolbox/data_model/python/populate_edits_sql
and run the commands in each of their corresponding crashes, units, and people.sql
files in orderStart the VZE and check it out
To run the Socrata export, locate the 1pass item called
Env file for the Vision Zero new data model Socrata export ETL
and save it in the./atd-etl/socrata_export
directory as.env
.Build and run the Socrata export. This may take up to 20 minutes.