cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

Populate tables with VZ-edited data #17207

Closed johnclary closed 3 days ago

johnclary commented 3 months ago

Write a script which populates the _edits tables in the new data model schema. The script should populate the difference between records in a fresh CRIS extract and the current production records in our database.

We only want to identify edits to a subset of fields that are considered "no override" fields in our current database. I have created this worksheet (tab name: populate VZ edits columns) which lists the fields of concern along with some additional important context.

The output of this work should be a script that we can execute as we continue to do development work on the new data model, and it will also need to be run against our staging and production databases when we launch the data model.

johnclary commented 3 months ago

@frankhereford this is ready when you are! reminder that you'll want to branch from data-model-v2.

johnclary commented 1 month ago

@frankhereford i'm sharing a query i've put together that attempts to do all edit detecting and updating via SQL. it needs some more work, but i'm seeing really solid performance when i run this locally against the full DB.

if you've got some time later in the sprintt, would you mind kicking the tires on this and let me know what you think?

--
-- union person + primary person tables
--
with unioned_people_edits as (
    select
        crash_id,
        unit_nbr,
        prsn_nbr,
        prsn_type_id,
        prsn_occpnt_pos_id,
        prsn_injry_sev_id,
        prsn_age,
        prsn_last_name,
        prsn_first_name,
        prsn_mid_name,
        prsn_gndr_id,
        prsn_ethnicity_id,
        peh_fl as prsn_exp_homelessness
    from atd_txdot_primaryperson
    union all
    select
        crash_id,
        unit_nbr,
        prsn_nbr,
        prsn_type_id,
        prsn_occpnt_pos_id,
        prsn_injry_sev_id,
        prsn_age,
        prsn_last_name,
        prsn_first_name,
        prsn_mid_name,
        prsn_gndr_id,
        prsn_ethnicity_id,
        peh_fl as prsn_exp_homelessness
    from atd_txdot_person

),

--
-- join "old" people records to new records from cris, where each record is one row with a column for the old value and cris value
--
joined_people as (
    select
        people_cris.id,
        people_edit.crash_id,
        people_cris.unit_nbr,
        people_cris.prsn_nbr,
        people_edit.prsn_type_id as prsn_type_id_edit,
        people_edit.prsn_occpnt_pos_id as prsn_occpnt_pos_id_edit,
        people_edit.prsn_injry_sev_id as prsn_injry_sev_id_edit,
        people_edit.prsn_age as prsn_age_edit,
        people_edit.prsn_last_name as prsn_last_name_edit,
        people_edit.prsn_first_name as prsn_first_name_edit,
        people_edit.prsn_mid_name as prsn_mid_name_edit,
        people_edit.prsn_gndr_id as prsn_gndr_id_edit,
        people_edit.prsn_ethnicity_id as prsn_ethnicity_id_edit,
        people_edit.prsn_exp_homelessness as prsn_exp_homelessness_edit,
        people_cris.prsn_type_id as prsn_type_id_cris,
        people_cris.prsn_occpnt_pos_id as prsn_occpnt_pos_id_cris,
        people_cris.prsn_injry_sev_id as prsn_injry_sev_id_cris,
        people_cris.prsn_age as prsn_age_cris,
        people_cris.prsn_last_name as prsn_last_name_cris,
        people_cris.prsn_first_name as prsn_first_name_cris,
        people_cris.prsn_mid_name as prsn_mid_name_cris,
        people_cris.prsn_gndr_id as prsn_gndr_id_cris,
        people_cris.prsn_ethnicity_id as prsn_ethnicity_id_cris,
        people_cris.prsn_exp_homelessness as prsn_exp_homelessness_cris
    from people_cris
    left join
        unioned_people_edits as people_edit
        on
            people_cris.cris_crash_id = people_edit.crash_id
            and people_cris.unit_nbr = people_edit.unit_nbr
            and people_cris.prsn_nbr = people_edit.prsn_nbr
),

--
-- construct a table that looks a lot like people_edits,
-- where there is a value in each column if it's different from the cris value
--
people_diffs as (
    select
        id,
        crash_id,
        unit_nbr,
        prsn_nbr,
        case
            when
                prsn_type_id_edit is not null
                and prsn_type_id_edit != prsn_type_id_cris
                then prsn_type_id_edit
        end as prsn_type_id,
        case
            when
                prsn_occpnt_pos_id_edit is not null
                and prsn_occpnt_pos_id_edit != prsn_occpnt_pos_id_cris
                then prsn_occpnt_pos_id_edit
        end as prsn_occpnt_pos_id,
        case
            when
                prsn_injry_sev_id_edit is not null
                and prsn_injry_sev_id_edit != prsn_injry_sev_id_cris
                then prsn_injry_sev_id_edit
        end as prsn_injry_sev_id,
        case
            when
                prsn_age_edit is not null and prsn_age_edit != prsn_age_cris
                then prsn_age_edit
        end as prsn_age,
        case
            when
                prsn_last_name_edit is not null
                and prsn_last_name_edit != prsn_last_name_cris
                then prsn_last_name_edit
        end as prsn_last_name,
        case
            when
                prsn_first_name_edit is not null
                and prsn_first_name_edit != prsn_first_name_cris
                then prsn_first_name_edit
        end as prsn_first_name,
        case
            when
                prsn_mid_name_edit is not null
                and prsn_mid_name_edit != prsn_mid_name_cris
                then prsn_mid_name_edit
        end as prsn_mid_name,
        case
            when
                prsn_gndr_id_edit is not null
                and prsn_gndr_id_edit != prsn_gndr_id_cris
                then prsn_gndr_id_edit
        end as prsn_gndr_id,
        case
            when
                prsn_ethnicity_id_edit is not null
                and prsn_ethnicity_id_edit != prsn_ethnicity_id_cris
                then prsn_ethnicity_id_edit
        end as prsn_ethnicity_id,
        case
            when
                prsn_exp_homelessness_edit is not null
                and prsn_exp_homelessness_edit != prsn_exp_homelessness_cris
                then prsn_exp_homelessness_edit
            else false
        end as prsn_exp_homelessness
    from joined_people
)

--
-- update the people_edits records with the diffs!
--
update people_edits
set
    prsn_type_id = diffs.prsn_type_id,
    prsn_occpnt_pos_id = diffs.prsn_occpnt_pos_id,
    prsn_injry_sev_id = diffs.prsn_injry_sev_id,
    prsn_age = diffs.prsn_age,
    prsn_last_name = diffs.prsn_last_name,
    prsn_first_name = diffs.prsn_first_name,
    prsn_mid_name = diffs.prsn_mid_name,
    prsn_gndr_id = diffs.prsn_gndr_id,
    prsn_ethnicity_id = diffs.prsn_ethnicity_id,
    prsn_exp_homelessness = diffs.prsn_exp_homelessness
from (
    select * from people_diffs
    where
        prsn_type_id is not null
        or prsn_occpnt_pos_id is not null
        or prsn_injry_sev_id is not null
        or prsn_age is not null
        or prsn_last_name is not null
        or prsn_first_name is not null
        or prsn_mid_name is not null
        or prsn_gndr_id is not null
        or prsn_ethnicity_id is not null
        -- we have an errant constraint in our new schema that requires `people_edits.prsn_exp_homelessness` not be null 👎
        or prsn_exp_homelessness is true
) as diffs
where people_edits.id = diffs.id;