The shielding data contains aggregated LA's which are not being correctly joined in preprocess data.R.
These LA's need disaggregating and the shielding counts multiplied by population weighted scores. The following adjustments fix this issue, but just need wrapping up into a PR:
# ---- Load libraries ----
library(tidyverse)
library(httr)
library(readxl)
library(lubridate)
# ---- Population counts ----
# Needed in shielding below to estimate counts in aggregated areas
# source: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/middlesuperoutputareamidyearpopulationestimates
GET("https://www.ons.gov.uk/file?uri=%2fpeoplepopulationandcommunity%2fpopulationandmigration%2fpopulationestimates%2fdatasets%2fmiddlesuperoutputareamidyearpopulationestimates%2fmid2018sape21dt3a/sape21dt3amid2018msoaon2019lasyoaestimatesformatted.zip",
write_disk(tf <- tempfile(fileext = ".zip")))
unzip(tf, exdir = "data/population")
unlink(tf); rm(tf)
pop_eng <- read_excel("data/population/SAPE21DT3a-mid-2018-msoa-on-2019-LA-syoa-estimates-formatted.xlsx", sheet = "Mid-2018 Persons", skip = 4)
pop_eng <-
pop_eng %>%
select(code = `Area Codes`,
la_name = `LA (2019 boundaries)`,
population = `All Ages`) %>%
filter(!is.na(la_name)) %>%
rename(la_code = code)
# ---- Shielding ----
# Coronavirus Shielded Patient List, England - Local Authority: https://digital.nhs.uk/data-and-information/publications/statistical/mi-english-coronavirus-covid-19-shielded-patient-list-summary-totals/latest
shielding <- read_csv("https://files.digital.nhs.uk/BC/85E39A/Coronavirus%20Shielded%20Patient%20List%2C%20England%20-%20Open%20Data%20with%20CMO%20DG%20-%20LA%20-%202020-09-09.csv")
shielding <- shielding %>%
# keep only latest values (if more than one extraction happens to be in this file)
mutate(`Extract Date` = dmy(`Extract Date`)) %>%
filter(`Extract Date` == max(`Extract Date`)) %>%
filter(`LA Code` != "ENG") %>% # don't need England-wide figures
filter(`Breakdown Field` == "ALL") %>% #don't need age/gender splits
select(la_code = `LA Code`, la_name = `LA Name`, shielding = `Patient Count`)
# Realign LA's to VI
shielding <-
shielding %>%
# Replace Buckinghamshire with South Bucks
mutate(la_code = if_else(la_code == "E06000060",
"E07000006",
la_code)) %>%
# Split Cornwall and Isles of Scilly and Hackney and City of London.
separate_rows(la_code) %>%
# Multiple Cornwall, Isles of Scilly, Hackney and City of London
# shielding counts by population weighted score to work out estimated counts
mutate(shielding = case_when(la_code == "E09000012" ~ shielding * (279665 / (279665 + 8706)),
la_code == "E09000001" ~ shielding * (8706 / (279665 + 8706)),
la_code == "E06000052" ~ shielding * (565968 / (565968 + 2242)),
la_code == "E06000053" ~ shielding * (2242 / (565968 + 2242)),
TRUE ~ shielding)) %>%
select(-la_name) %>%
mutate(shielding = round(shielding)) %>%
select(LAD19CD = la_code, `Clinically extremely vulnerable` = shielding)
The shielding data contains aggregated LA's which are not being correctly joined in
preprocess data.R
.These LA's need disaggregating and the shielding counts multiplied by population weighted scores. The following adjustments fix this issue, but just need wrapping up into a PR: