Public-Health-Scotland / phslookups

PHS Lookups
https://public-health-scotland.github.io/phslookups/
Other
0 stars 0 forks source link

Scottish postcode converter (function) #3

Open Nic-Chr opened 3 years ago

Nic-Chr commented 3 years ago

Proposing a function to convert postcodes to variables found in the Scottish postcode directory.

postcode_match <- function(x, group, factor = FALSE, ...){
  if (length(group) > 1) stop("Please supply a group of length 1")
  y <- gsub(" ", "", x, fixed = TRUE)
  y <- toupper(y)
  utils::data("spd_2021_1", envir = environment(), package = "phsmethods")
  postcodes <- spd_2021_1[["postcode"]]
  group_names <- spd_2021_1[[group]]
  y <- group_names[match(y, postcodes)]
  if (factor) {
    x_levels <- sort(unique(group_names))
    if (sum(is.na(y)) > 0) x_levels <- c(x_levels, NA_character_)
    y <- factor(y, levels = x_levels, ...)
  }
  return(y)
}

It could work like below:

> postcode_match("G2 1AL", group = "ca2019name")
[1] "Glasgow City"
> postcode_match("G2 1AL", group = "hb2019name")
[1] "Greater Glasgow and Clyde"
> postcode_match("G2 1AL", group = "date_of_introduction")
[1] "2011-05-03"
> postcode_match("G2 1AL", group = "hb2019name", factor = TRUE)
[1] Greater Glasgow and Clyde
14 Levels: Ayrshire and Arran Borders Dumfries and Galloway Fife Forth Valley Grampian Greater Glasgow and Clyde Highland Lanarkshire Lothian ... Western Isles
Moohan commented 1 year ago

I think this type of function would be super useful and I would love to see it added to the package. However, I think it has some serious issues that I don't have an immediate solution to.

  1. The data would need to be added to phsmethods, making the package very large. This isn't desirable and particularly is something that CRAN would likely object to.
  2. The data would go out of date. This is the same issue as with match_area (see Public-Health-Scotland/phsmethods#71). For that function, there is some code in the package to update the data but that relies on 1) package maintainers remembering to regularly update the data and 2) users updating to the latest version to ensure they have the latest data.

Some ideas I had to work around this would be:

Nic-Chr commented 8 months ago

Thanks @Moohan,

  1. I agree, I have an internal package that uses a cut down compressed rda format of the SPD and it's 3MB, which is still quite large considering it's a standalone dataset.

I agree that a good solution to this would be to create a separate package containing the postcode directory that could sit within phsverse.

Alternatively we could create a function within phsmethods to allow the user to download the SPD on-demand, which would then get loaded into their R environment for use for the rest of the session. This would require a dependency on an API and some code to make sure it works as expected.

I would lean towards creating a package for the directory (and maybe other common Scottish lookup files, though not sure if this already exists).

Nic-Chr commented 8 months ago
  1. If we do decide to go with a separate package, updating it every say 6 months or so should be fairly manageable I would think.