cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

[Bug] There are some fatalities counts by mode that do not match what is in the Vision Zero Editor #15509

Closed atdservicebot closed 7 months ago

atdservicebot commented 8 months ago

What application are you using?

Vision Zero Viewer

Describe the problem.

There are some fatalities counts by mode that do not match what is in the Vision Zero Editor.

Is there anything else we should know?

-

Website Address

https://visionzero.austin.gov/viewer/

Internet Browser: Edge

Requested By Xavier A.

Request ID: DTS24-112320

xavierapostol commented 8 months ago

The fatal counts should be as follows:

patrickm02L commented 8 months ago

1/31 Refinement

Approach to resolving count

This is the column in the crash data: atd_mode_category_metadata

Note

xavierapostol commented 8 months ago

Please keep in mind that one of the fatalities has the person killed in the wrong unit. I don't know how that happened, but the data is off and causing a miscount.

chiaberry commented 8 months ago

For reference here are the queries from last year's discrepancy: https://github.com/cityofaustin/atd-data-tech/issues/11340

patrickm02L commented 8 months ago

Evaluate this Issue to see if there's a simple fix, if not we can take last year's approach to hard code the right values and spin off another to find the fix. Refer to Issue above from last year to see if the queries point to the problem with the inaccurate counts.

chiaberry commented 8 months ago

maybe look at this issue too? https://github.com/cityofaustin/atd-data-tech/issues/2413

chiaberry commented 8 months ago

@xavierapostol could you share your queries with us to aid in untangling the mismatch? aka how you got the "correct" fatality numbers for 2023? thanks

Charlie-Henry commented 8 months ago

Found an example of a crash where the atd_mode_category_metadata field has desynced with the units.

https://visionzero.austin.gov/editor/#/crashes/19609765

atd_mode_category_metadata shows a "Other/Unknown" mode with a fatality.

"mode_id":9,"mode_desc":"Other/Unknown","unit_id":4767930,"death_cnt":1,"sus_serious_injry_cnt":0,"nonincap_injry_cnt":0,"poss_injry_cnt":0,"non_injry_cnt":0,"unkn_injry_cnt":0,"tot_injry_cnt":0}

meanwhile the crash demographics dataset shows a mode of pedestrian:

Screenshot 2024-02-07 at 5 01 16 PM
xavierapostol commented 8 months ago

Here is my analysis of some of the Vision Zero Editor "VZE" (I don't know how the VZV is extracting data) mismatches. I've also included the query that looks at every fatality. Fatalities VZE vs VZdb - 20150101 to 20240208.xlsx

------------------------------------------ (Each Row Represents a Person) WITH up AS ( SELECT atp.person_id AS "prsn_id", atp.crash_id, atp.unit_nbr, atp.prsn_nbr, atisl.injry_sev_desc, atu.unit_id, CASE WHEN atu.unit_desc_id IS NULL THEN 'MOTOR VEHICLE' ELSE atvudl.veh_unit_desc_desc END AS "unit_desc_desc", CASE WHEN atu.veh_body_styl_id IS NULL THEN 'UNKNOWN' ELSE atvbsl.veh_body_styl_desc END AS "veh_body_styl_desc", CASE WHEN atp.prsn_injry_sev_id = 4 THEN 1 ELSE 0 END AS "_k", CASE WHEN atp.prsn_injry_sev_id = 1 THEN 1 ELSE 0 END AS "_a", CASE WHEN atp.prsn_injry_sev_id = 2 THEN 1 ELSE 0 END AS "_b", CASE WHEN atp.prsn_injry_sev_id = 3 THEN 1 ELSE 0 END AS "_c", CASE WHEN atp.prsn_injry_sev_id = 5 THEN 1 ELSE 0 END AS "_n", CASE WHEN atp.prsn_injry_sev_id NOT IN ( 1, 2, 3, 4, 5 ) OR atp.prsn_injry_sev_id IS NULL THEN 1 ELSE 0 END AS "_u" FROM atd_txdot_person AS atp LEFT JOIN atd_txdot_units AS atu ON atu.crash_id = atp.crash_id AND atu.unit_nbr = atp.unit_nbr LEFT JOIN atd_txdotveh_unit_desc_lkp AS atvudl ON atvudl.veh_unit_desc_id = atu.unit_desc_id LEFT JOIN atd_txdotveh_body_styl_lkp AS atvbsl ON atvbsl.veh_body_styl_id = atu.veh_body_styl_id LEFT JOIN atd_txdotinjry_sev_lkp AS atisl ON atisl.injry_sev_id = atp.prsn_injry_sev_id UNION ALL SELECT atp2.primaryperson_id AS "prsn_id", atp2.crash_id, atp2.unit_nbr, atp2.prsn_nbr, atisl2.injry_sev_desc, atu2.unit_id, CASE WHEN atu2.unit_desc_id IS NULL THEN 'MOTOR VEHICLE' ELSE atvudl2.veh_unit_desc_desc END AS "unit_desc_desc", CASE WHEN atu2.veh_body_styl_id IS NULL THEN 'UNKNOWN' ELSE atvbsl2.veh_body_styl_desc END AS "veh_body_styl_desc", CASE WHEN atp2.prsn_injry_sev_id = 4 THEN 1 ELSE 0 END AS "_k", CASE WHEN atp2.prsn_injry_sev_id = 1 THEN 1 ELSE 0 END AS "_a", CASE WHEN atp2.prsn_injry_sev_id = 2 THEN 1 ELSE 0 END AS "_b", CASE WHEN atp2.prsn_injry_sev_id = 3 THEN 1 ELSE 0 END AS "_c", CASE WHEN atp2.prsn_injry_sev_id = 5 THEN 1 ELSE 0 END AS "_n", CASE WHEN atp2.prsn_injry_sev_id NOT IN ( 1, 2, 3, 4, 5 ) OR atp2.prsn_injry_sev_id IS NULL THEN 1 ELSE 0 END AS "_u" FROM atd_txdot_primaryperson AS atp2 LEFT JOIN atd_txdot_units AS atu2 ON atu2.crash_id = atp2.crash_id AND atu2.unit_nbr = atp2.unit_nbr LEFT JOIN atd_txdotveh_unit_desc_lkp AS atvudl2 ON atvudl2.veh_unit_desc_id = atu2.unit_desc_id LEFT JOIN atd_txdotveh_body_styl_lkp AS atvbsl2 ON atvbsl2.veh_body_styl_id = atu2.veh_body_styl_id LEFT JOIN atd_txdotinjry_sev_lkp AS atisl2 ON atisl2.injry_sev_id = atp2.prsn_injry_sev_id ), p_agg AS ( SELECT agg.crash_id, atisl2.injry_sev_desc AS "crash_sev" FROM ( SELECT up.crash_id, SUM ( up._k ) AS "k_cnt", SUM ( up._a ) AS "a_cnt", SUM ( up._b ) AS "b_cnt", SUM ( up._c ) AS "c_cnt", SUM ( up._n ) AS "n_cnt", SUM ( up._u ) AS "u_cnt" FROM up GROUP BY up.crash_id ) AS agg LEFT JOIN atd_txdot__injry_sev_lkp AS atisl2 ON atisl2.injry_sev_id = CASE WHEN agg.k_cnt > 0 THEN 4 WHEN agg.a_cnt > 0 THEN 1 WHEN agg.b_cnt > 0 THEN 2 WHEN agg.c_cnt > 0 THEN 3 WHEN agg.n_cnt > 0 THEN 5 ELSE 0 END ) SELECT atc.crash_date, atc.crash_time, p_agg.crash_sev, up.unit_nbr, up.prsn_nbr, up.injry_sev_desc, up.unit_desc_desc, up.veh_body_styl_desc, up.unit_id, atc.crash_id, up.prsn_id FROM atd_txdot_crashes AS atc LEFT JOIN up ON up.crash_id = atc.crash_id LEFT JOIN p_agg ON p_agg.crash_id = atc.crash_id WHERE atc.private_dr_fl = 'N' AND atc.in_austin_full_purpose = TRUE AND atc.crash_date BETWEEN '2015-01-01' AND '2024-02-08' AND up.injry_sev_desc = 'FATAL INJURY'
;

xavierapostol commented 8 months ago

Also, this fatality is showing up as "Other" when it should be a "Pedestrian". https://visionzero.austin.gov/editor/#/crashes/19609765

Charlie-Henry commented 8 months ago

Summarizing 2023 fatality differences:

mddilley commented 8 months ago

Update on this, I stubbed out a query that I feel good about using to see where these mismatches are coming from. Some adjustments are needed to make the right comparisons, but, so far, the total fatalities match with Xavier's query: 117 in 2022 (1 less than VZV) and 91 in 2023 (same as VZV). Next steps are to drill down into the crash IDs that Charlie and Xavier noted and then see if there is an issue with a trigger, if the numbers can be fixed through VZE edits in the meantime, and if there is further work needed.

Query mode fatalities using the people tables

WITH people_crash_modes AS (
    -- combine primary person and person fatalities
    WITH pf AS (
        SELECT
            p.crash_id,
            p.unit_nbr,
            p.prsn_nbr,
            p.prsn_injry_sev_id,
            p.primaryperson_id AS person_id
        FROM
            atd_txdot_primaryperson AS p
        WHERE
            prsn_injry_sev_id = 4
        UNION ALL
        SELECT
            pp.crash_id,
            pp.unit_nbr,
            pp.prsn_nbr,
            pp.prsn_injry_sev_id,
            pp.person_id
        FROM
            atd_txdot_person AS pp
        WHERE
            prsn_injry_sev_id = 4
)
    SELECT
        pf.person_id,
        pf.crash_id,
        pf.unit_nbr,
        pf.prsn_nbr,
        pf.prsn_injry_sev_id,
        c.atd_fatality_count,
        c.death_cnt AS crash_death_cnt,
        c.atd_mode_category_metadata,
        c.crash_date,
        u.unit_id,
        u.death_cnt as unit_death_cnt,
        u.atd_mode_category AS unit_mode_category,
        -- unpack the values from the crashes atd_mode_category_metadata column
        (WITH mode_metadata_units (
                elem
) AS (
                SELECT
                    json_array_elements(atd_mode_category_metadata)
                FROM
                    atd_txdot_crashes
                WHERE
                    crash_id = c.crash_id),
                mode_metadata_unit_ids AS (
                    SELECT
                        (elem -> 'unit_id')::text::int AS unit_id,
                        (elem -> 'mode_id')::text::int AS mode_id
                    FROM
                        mode_metadata_units
)
                SELECT
                    mode_id
                FROM
                    mode_metadata_unit_ids
                WHERE
                    unit_id = u.unit_id) AS metadata_mode_id,
                (WITH mode_metadata_units (
                        elem
) AS (
                        SELECT
                            json_array_elements(atd_mode_category_metadata)
                        FROM
                            atd_txdot_crashes
                        WHERE
                            crash_id = c.crash_id),
                        mode_metadata_unit_ids AS (
                            SELECT
                                (elem -> 'unit_id')::text::int AS unit_id,
                                (elem -> 'death_cnt')::text::int AS death_cnt
                            FROM
                                mode_metadata_units
)
                        SELECT
                            death_cnt
                        FROM
                            mode_metadata_unit_ids
                        WHERE
                            unit_id = u.unit_id) AS metadata_death_cnt
                    FROM
                        pf
                        -- join crashes and units to persons so we can get related data
                    LEFT JOIN atd_txdot_crashes AS c ON pf.crash_id = c.crash_id
                    LEFT JOIN atd_txdot_units AS u ON u.crash_id = pf.crash_id
                        AND u.unit_nbr = pf.unit_nbr
                WHERE
                    -- apply date range and match filters in Xavier's query
                    TRUE
                    AND c.private_dr_fl = 'N'
                    AND c.in_austin_full_purpose = TRUE
                    AND c.crash_date BETWEEN '2022-01-01'
                    AND '2022-12-31'
)
SELECT
    *
FROM
    people_crash_modes
WHERE
    TRUE
    --  AND atd_mode_category != metadata_unit_id
--      AND crash_id = 19113648
    -- pedestrian mode filter     
--  AND atd_fatality_count > 0
--  AND metadata_death_cnt > 0
-- motor vehicle
-- AND metadata_mode_id IN (1,2,4)
-- pedestrian
-- AND metadata_mode_id IN (7)
-- e-scooter/micromobility
-- AND metadata_mode_id IN (10,11)
-- bicycle
-- AND metadata_mode_id IN (5)
-- motorcycle
-- AND metadata_mode_id IN (3)
-- other
-- AND metadata_mode_id IN (6,8,9)

Query mode fatalities in the atd_mode_category_metadata crash column

WITH crash_modes AS (
    SELECT
        crash_id,
        (metadata ->> 'unit_id')::int AS unit_id,
        (metadata ->> 'death_cnt')::int AS death_cnt,
        (metadata ->> 'mode_id')::int AS metadata_mode_id,
        atd_fatality_count,
        crash_date
    FROM (
        SELECT
            crash_id,
            json_array_elements(atd_mode_category_metadata) AS metadata,
            atd_fatality_count,
            crash_date
        FROM
            atd_txdot_crashes
        WHERE
            TRUE
            AND private_dr_fl = 'N'
            AND in_austin_full_purpose = TRUE) m
)
SELECT
    crash_id, unit_id, death_cnt, metadata_mode_id, atd_fatality_count, crash_date
FROM
    crash_modes
WHERE
    TRUE
--  AND crash_id = 19113648
-- motor vehicle
-- AND metadata_mode_id IN (1,2,4)
-- pedestrian
AND metadata_mode_id IN (7)
-- e-scooter/micromobility
-- AND metadata_mode_id IN (10,11)
-- bicycle
-- AND metadata_mode_id IN (5)
-- motorcycle
-- AND metadata_mode_id IN (3)
-- other
-- AND metadata_mode_id IN (6,8,9)
    AND death_cnt > 0
--  AND atd_fatality_count > 0
    AND crash_date BETWEEN '2022-01-01'
    AND '2022-12-31'
mddilley commented 8 months ago

@xavierapostol For 2022, it looks like the extra pedestrian is coming from https://visionzero.austin.gov/editor/#/crashes/19113648. Could you check out that record when you can? I see two units with fatalities (one pedestrian) in the units accordion and only one person fatality in the people accordion (no pedestrian). I think an edit should fix that one. Edit: Just like i noted in my next reply. An edit to the crash data on this page will have to happen in order for the unit data to recalculate. Thanks! Still looking at 2023.

I found this record by comparing a version of the people-based query in my last comment (returned 46 pedestrian fatalities like your query) against the crash-unit based query that feeds VZV from Socrata (returns 47 records).

mddilley commented 8 months ago

For 2023:

Xavier - after making these changes in my local test environment, both your query above (with added 2023 date range + unit_desc_desc = PEDESTRIAN filters) and my people query (with same filters) both produce 37 pedestrian fatalities for 2023. I think this covers the mismatches, but happy to work on this more if I've missed something.

xavierapostol commented 8 months ago

@xavierapostol For 2022, it looks like the extra pedestrian is coming from https://visionzero.austin.gov/editor/#/crashes/19113648. Could you check out that record when you can? I see two units with fatalities (one pedestrian) in the units accordion and only one person fatality in the people accordion (no pedestrian). I think an edit should fix that one. Edit: Just like i noted in my next reply. An edit to the crash data on this page will have to happen in order for the unit data to recalculate. Thanks! Still looking at 2023.

I found this record by comparing a version of the people-based query in my last comment (returned 46 pedestrian fatalities like your query) against the crash-unit based query that feeds VZV from Socrata (returns 47 records).

This is a strange one and because I cannot edit the details of the person type, this fatality is populating strangely. What is on the CR3 is 3 units with a total of 8 persons.

What it should be is:

mddilley commented 8 months ago

@xavierapostol I see exactly what you are saying. Thanks for highlighting the connection between the data and the narrative. 🙏 I think the main issue we are facing is that unit fatalities are what ultimately show up on VZV in the crash mode visualization by way of a crash row's atd_mode_category_metadata column. What I'm going to suggest is a holdover until we can update VZE to allow you to edit the people records type field.

So, I propose:

  1. Look at the atd_mode_category_metadata column for crash row with crash ID 19113648 in the production read replica, and you will see two objects in the json array with death_cnt = 1 (one Pedestrian and one Large passenger vehicle). Then, update https://visionzero.austin.gov/editor/#/crashes/19113648 to have one fatality on Unit 4 and no fatalities on Unit 2. Then, change and save a crash field like the Suspected Serious Injury Count to a new value and then change it back and save (to trigger the unit metadata recalculation). Look at the atd_mode_category_metadata column for this crash again, you will see only one Pedestrian fatality.
    • The ETL will pick up this change overnight and show 1 less Motorist fatality and one more pedestrian fatality for 2022 on VZV. Our people queries will produce matching numbers with the changes in the next step.
  2. I create an issue for us to make the person type field editable in the People accordion
  3. I create an issue for us to fix the trigger that calculates atd_mode_category_metadata so that it reacts to crash updates and unit updates. Or, we can look at other ways to address the de-sync issue. cc @johnclary

Let me know if this sounds good to you, and I'm attaching a spreadsheet that I used to evaluate the modes produced by all of our queries.

vz-numbers-02212024.xlsx

Edit: forgot to mention that I updated my SQL queries in this comment above

xavierapostol commented 8 months ago

I tried my best to match the persons to the unit, but the VZE does not allow me to change certain attributes. I hope this worked (fingers crossed)

mddilley commented 8 months ago

Looks great to me, Xavier! I'm working on an update so that edits in the units accordion make their way into the crash unit metadata that powers the VZV By Mode viz. I'll keep you in the loop on that.

johnclary commented 7 months ago

Since Mike fixed the underlying issue with crash-mode data desync, i was able to make minor edits (which i immediately undid) to these two crashes, and that triggered the crash-level mode metadata to rebuild:

For this 2022 crash, it looks like Xavier's edit per Mike's instruction fixed the metadata desync:

I re-ran the VZD > Data Portal export, but unfortunately we still have an extra ped fatality in 2022. i'm going to have to dig into this next week.

johnclary commented 7 months ago

@xavierapostol can you re-confirm the expected pedestrian fatality total for 2022? i just reran your query and i'm now seeing the following numbers, which do now match VZV:

So I think we cleared all this up by fixing the mismatched units and re-syncing the mode data. I am ready to close this if you are, Xavier!

xavierapostol commented 7 months ago

@johnclary, the total count for 2022 should be as follows:

The numbers you have equal 118.

This crash still has a fatal_crash_fl = 'N' This crash is a multi-fatality crash with 1 pedestrian and 1 motor vehicle passenger

johnclary commented 7 months ago

My bad, our numbers are aligned at 117 total fatalities in 2022. I'm closing this issue.

Screenshot 2024-03-04 at 2 03 20 PM