ETL process that stores monthly crash data

JaceDeloney commented 4 years ago

[x] Understand what endpoints we're hitting now
[x] Look for ways to optimize/use the same queries
[x] Create an ELT process that pulls data from Socrata once a month so that VZV stores the data. This would improve the mobile performance of VZV. We want to push this data into the defined queries

sergiogcx commented 4 years ago

So far I have been able to create different views in Hasura that can feed correct data

view_vzv_header_totals : This view generate data for the four top chars with total counts (fatalities, years of life lost, serious injuries and total crashes)
view_vzv_by_month_year : Generates data for the "By Month/year" chart
view_vzv_by_time_of_day : Generates data for "By time of day" chart
view_vzv_demographics_age_sex_eth : Generates data for the "Demographics" chart.

To be implemented:

By Mode

sergiogcx commented 4 years ago

Currently, I am able to query the data for all those charts on a single query; however, the payload is still pretty large (130kb) given it's JSON format which is inherently repetitive. This is without the By Mode data, so I would expect it to be around 150kb when that one is done too.

This is how the GraphQL query is looking like:

query getVZVData {
  view_vzv_header_totals {
    death_cnt
    sus_serious_injry_cnt
    total_crashes
    year
    years_of_life_lost
  }
  view_vzv_by_month_year {
    death_cnt
    month
    sus_serious_injry_cnt
    year
  }
  view_vzv_by_time_of_day {
    year
    sus_serious_injry_cnt
    hour
    dow
    death_cnt
  }
  view_vzv_demographics_age_sex_eth {
    year
    type
    under_18
    from_18_to_44
    from_45_to_64
    from_65
    unknown
    gender_male
    gender_female
    gender_unknown
    eth_unknown
    ethn_white
    ethn_hispanic
    ethn_black
    ethn_asian
    ethn_other
    ethn_amer_ind_nat
    total
  }
  view_vzv_by_mode {
    year
    unit_desc
    death_cnt
    sus_serious_injry_cnt
  }
}

sergiogcx commented 4 years ago

As another comment, I am planning on summarizing the JSON data into a hybrid JSON/CSV format. That should reduce it to something that is a lot smaller.

Or also possibly use CSV format instead, and split the output into 5 separate files.

sergiogcx commented 4 years ago

The ETL process has been created, it's live and running as of last night.

cityofaustin / atd-data-tech

ETL process that stores monthly crash data #2642