Analyze and fix crash listing filter bugs

Recently, Tilly asked me if I would help look into the discrepancies between that counts of crashes found using the modality toggles on the crash listing page. Ultimately, this led to a discovery that our graphql snippets which we feed into our DataGrid system are flawed in such a way that they can return crashes which do not belong in the result sets. This behavior contributes to crash-count discrepancies.

I did this during the investigation:

Intercepting the graphql query being sent to the endpoint when enabling the "Motor Vehicle" Unit Type filter.
Transcribing this query into the intended, corollary SQL
Observing that these two queries return different sets of crashes
Using this python program to determine which crashes show up in one set and not the other.

GraphQL:

{
  atd_txdot_crashes_aggregate(where: {
    crash_date: {_gte: "2017-01-01", _lte: "2023-07-28"},
    in_austin_full_purpose: {_eq: true}, 
    private_dr_fl: {_neq: "Y"}, 
    _or: [
      {_and: [
        {units: {unit_desc_id: {_eq: 1}}},
        {_and: [
          {units: {veh_body_styl_id: {_neq: 71}}},
          {units: {veh_body_styl_id: {_neq: 90}}}
        ]}
      ]}
    ]}) {
    aggregate {
      count
      __typename
    }
    __typename
  }
}

SQL:

select crashes.crash_id, crashes.crash_date, crashes.private_dr_fl, crashes.in_austin_full_purpose, units.unit_desc_id , units.veh_body_styl_id 
from atd_txdot_crashes crashes
  join atd_txdot_units units on (crashes.crash_id = units.crash_id)
where true 
  and crash_date >= '2013-07-28' and crash_date <= '2023-07-28'
  and crashes.private_dr_fl != 'Y'
  and crashes.in_austin_full_purpose is true
  and units.unit_desc_id = 1
  and units.veh_body_styl_id != 71
  and units.veh_body_styl_id != 90
  order by crashes.crash_id desc

It looks like the way the graphql is compiled to mean something slightly different than what is intended. The query finds crashes which have units, where at least one unit is of unit_desc_id 1, and where at least one unit is of veh_body_styl_id in the set of [71,90]. It will return crashes where different units each meet those conditions, including when there is no single unit that meets both of them at the same time. This is returning false positives.

This particular query needs to be re-written to have its where clause as follows:

    where: {
            crash_date: {_gte: "2013-07-28", _lte: "2023-07-28"},
            in_austin_full_purpose: {_eq: true},
            private_dr_fl: {_neq: "Y"},
            units: {
                    unit_desc_id: {_eq: 1}, 
                    veh_body_styl_id: {_nin: [71, 90]}
                    }
            }

This issue is to track fixing this query and making sure similar mis-queries are not being used for the other toggles.

cityofaustin / atd-data-tech

Analyze and fix crash listing filter bugs #13102

8/3/23

Discussion