cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

ETL process that stores monthly crash data #2642

Closed JaceDeloney closed 4 years ago

JaceDeloney commented 4 years ago
sergiogcx commented 4 years ago

So far I have been able to create different views in Hasura that can feed correct data

To be implemented:

sergiogcx commented 4 years ago

Currently, I am able to query the data for all those charts on a single query; however, the payload is still pretty large (130kb) given it's JSON format which is inherently repetitive. This is without the By Mode data, so I would expect it to be around 150kb when that one is done too.

This is how the GraphQL query is looking like:

query getVZVData {
  view_vzv_header_totals {
    death_cnt
    sus_serious_injry_cnt
    total_crashes
    year
    years_of_life_lost
  }
  view_vzv_by_month_year {
    death_cnt
    month
    sus_serious_injry_cnt
    year
  }
  view_vzv_by_time_of_day {
    year
    sus_serious_injry_cnt
    hour
    dow
    death_cnt
  }
  view_vzv_demographics_age_sex_eth {
    year
    type
    under_18
    from_18_to_44
    from_45_to_64
    from_65
    unknown
    gender_male
    gender_female
    gender_unknown
    eth_unknown
    ethn_white
    ethn_hispanic
    ethn_black
    ethn_asian
    ethn_other
    ethn_amer_ind_nat
    total
  }
  view_vzv_by_mode {
    year
    unit_desc
    death_cnt
    sus_serious_injry_cnt
  }
}
sergiogcx commented 4 years ago

As another comment, I am planning on summarizing the JSON data into a hybrid JSON/CSV format. That should reduce it to something that is a lot smaller.

Or also possibly use CSV format instead, and split the output into 5 separate files.

sergiogcx commented 4 years ago

The ETL process has been created, it's live and running as of last night.