cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
18 stars 2 forks source link

Analyze and fix crash listing filter bugs #13102

Closed frankhereford closed 1 year ago

frankhereford commented 1 year ago

Recently, Tilly asked me if I would help look into the discrepancies between that counts of crashes found using the modality toggles on the crash listing page. Ultimately, this led to a discovery that our graphql snippets which we feed into our DataGrid system are flawed in such a way that they can return crashes which do not belong in the result sets. This behavior contributes to crash-count discrepancies.

I did this during the investigation:

GraphQL:

{
  atd_txdot_crashes_aggregate(where: {
    crash_date: {_gte: "2017-01-01", _lte: "2023-07-28"},
    in_austin_full_purpose: {_eq: true}, 
    private_dr_fl: {_neq: "Y"}, 
    _or: [
      {_and: [
        {units: {unit_desc_id: {_eq: 1}}},
        {_and: [
          {units: {veh_body_styl_id: {_neq: 71}}},
          {units: {veh_body_styl_id: {_neq: 90}}}
        ]}
      ]}
    ]}) {
    aggregate {
      count
      __typename
    }
    __typename
  }
}

SQL:

select crashes.crash_id, crashes.crash_date, crashes.private_dr_fl, crashes.in_austin_full_purpose, units.unit_desc_id , units.veh_body_styl_id 
from atd_txdot_crashes crashes
  join atd_txdot_units units on (crashes.crash_id = units.crash_id)
where true 
  and crash_date >= '2013-07-28' and crash_date <= '2023-07-28'
  and crashes.private_dr_fl != 'Y'
  and crashes.in_austin_full_purpose is true
  and units.unit_desc_id = 1
  and units.veh_body_styl_id != 71
  and units.veh_body_styl_id != 90
  order by crashes.crash_id desc

It looks like the way the graphql is compiled to mean something slightly different than what is intended. The query finds crashes which have units, where at least one unit is of unit_desc_id 1, and where at least one unit is of veh_body_styl_id in the set of [71,90]. It will return crashes where different units each meet those conditions, including when there is no single unit that meets both of them at the same time. This is returning false positives.

This particular query needs to be re-written to have its where clause as follows:

    where: {
            crash_date: {_gte: "2013-07-28", _lte: "2023-07-28"},
            in_austin_full_purpose: {_eq: true},
            private_dr_fl: {_neq: "Y"},
            units: {
                    unit_desc_id: {_eq: 1}, 
                    veh_body_styl_id: {_nin: [71, 90]}
                    }
            }

This issue is to track fixing this query and making sure similar mis-queries are not being used for the other toggles.

patrickm02L commented 1 year ago

8/3/23

We discussed this in Refinement

Discussion