NIHR-BI / Health_Indicators

0 stars 0 forks source link

Health indicator data sources

OHID Fingertips, indicator values

https://fingertips.phe.org.uk/profile/guidance/supporting-information/api

import fingertips_py as ftp

data_for_multiple_ind_ids_for_one_area = ftp.retrieve_data.get_data_by_indicator_ids(indicator_ids=ids_as_str, # [Maximum 100]
                                                    area_type_id=area_type_id, # can be found in the documentation
                                                    include_sortable_time_periods=True, # includes an int format column for time period
                                                    is_test=False)

ONS Open Geography, shape values

https://www.api.gov.uk/ons/open-geography-portal/#open-geography-portal

An overview of how the code works to save the data

OHID Fingertips, indicator values

See the pipeline script.

  1. A table of indicator_ids available at area_type_ids is downloaded from the API.
  2. For user chosen area_type_ids, save all indicator_id values that are available at these area_type_ids. These are in chunks of 100 indicators per file as this is the APIs limit.
  3. Concatenate together all of the separate value files into one file.
  4. Clean the concatenated values file to be in the structure that we need. Save a dataset_ref file which is metadata.
  5. Save metadata for area_type_id.

ONS Open Geography, shape values

See the script which is currently not a module.

  1. For each of the shapes that you need, find the name of it in the ONS directory.
  2. Download the shape files.
  3. Concatenate the shape files together and save them as one large file.
  4. Clean the concatenated shape file to be in the structure that we need.

Brief explanation of how the data is shown on QlikSense

QlikSense dashboards are currently internal.

Data for an indicator is available for:

A unique indicator_dataset_id is made up of indicator_id, time_period_sortable, sex, age, and area_type_id. Only a single indicator_dataset_id can be mapped at a time.

For an indicator_id, we only want the latest area_type_year_configuration for each area_type_class and then we only want the latest time_period.

  1. For each unique combination of indicator_id and area_type_class, only keep the latest (max) area_type_year_configuration.
  2. Keeping 1) above, for each unique combination of indicator_id and area_type_class, only keep the latest (max) time_period_sortable. Now for each unique indicator_id and area_type_class combination, there will only be 1 area_type_year_configuration and 1 time_period_sortable left.
  3. Then the rest of the data can be loaded in with WHERE EXISTS statements to only include what is left in previous steps.

image