britishredcrosssociety / local-lockdown

R Shiny dashboard for visualising information related to Covid-19 Test & Trace and local lockdowns
Other
1 stars 1 forks source link

Shielding data #38

Open MikeJohnPage opened 3 years ago

MikeJohnPage commented 3 years ago

The shielding data contains aggregated LA's which are not being correctly joined in preprocess data.R.

These LA's need disaggregating and the shielding counts multiplied by population weighted scores. The following adjustments fix this issue, but just need wrapping up into a PR:

# ---- Load libraries ----
library(tidyverse)
library(httr)
library(readxl)
library(lubridate)

# ---- Population counts ----
# Needed in shielding below to estimate counts in aggregated areas
# source: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/middlesuperoutputareamidyearpopulationestimates
GET("https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fpopulationandmigration%2fpopulationestimates%2fdatasets%2fmiddlesuperoutputareamidyearpopulationestimates%2fmid2018sape21dt3a/sape21dt3amid2018msoaon2019lasyoaestimatesformatted.zip",
    write_disk(tf <- tempfile(fileext = ".zip")))

unzip(tf, exdir = "data/population")
unlink(tf); rm(tf)

pop_eng <- read_excel("data/population/SAPE21DT3a-mid-2018-msoa-on-2019-LA-syoa-estimates-formatted.xlsx", sheet = "Mid-2018 Persons", skip = 4)

pop_eng <-
  pop_eng %>% 
  select(code = `Area Codes`,
         la_name = `LA (2019 boundaries)`,
         population = `All Ages`) %>% 
  filter(!is.na(la_name)) %>% 
  rename(la_code = code)

# ---- Shielding ----
# Coronavirus Shielded Patient List, England - Local Authority: https://digital.nhs.uk/data-and-information/publications/statistical/mi-english-coronavirus-covid-19-shielded-patient-list-summary-totals/latest
shielding <- read_csv("https://files.digital.nhs.uk/BC/85E39A/Coronavirus%20Shielded%20Patient%20List%2C%20England%20-%20Open%20Data%20with%20CMO%20DG%20-%20LA%20-%202020-09-09.csv")

shielding <- shielding %>% 
  # keep only latest values (if more than one extraction happens to be in this file)
  mutate(`Extract Date` = dmy(`Extract Date`)) %>% 
  filter(`Extract Date` == max(`Extract Date`)) %>% 

  filter(`LA Code` != "ENG") %>%  # don't need England-wide figures
  filter(`Breakdown Field` == "ALL") %>%  #don't need age/gender splits

  select(la_code = `LA Code`,  la_name = `LA Name`, shielding = `Patient Count`)

# Realign LA's to VI
shielding <-
  shielding %>% 
  # Replace Buckinghamshire with South Bucks
  mutate(la_code = if_else(la_code == "E06000060",
                           "E07000006",
                           la_code)) %>% 
  # Split Cornwall and Isles of Scilly and Hackney and City of London.
  separate_rows(la_code) %>% 
  # Multiple Cornwall, Isles of Scilly, Hackney and City of London
  # shielding counts by population weighted score to work out estimated counts
  mutate(shielding = case_when(la_code == "E09000012" ~ shielding * (279665 / (279665 + 8706)),
                               la_code == "E09000001" ~ shielding * (8706 / (279665 + 8706)),
                               la_code == "E06000052" ~ shielding * (565968 / (565968 + 2242)),
                               la_code == "E06000053" ~ shielding * (2242 / (565968 + 2242)),
                               TRUE ~ shielding)) %>% 
  select(-la_name) %>% 
  mutate(shielding = round(shielding)) %>%
  select(LAD19CD = la_code, `Clinically extremely vulnerable` = shielding)