Metropolitan-Council / councilR

A curated collection of commonly used templates, color palettes, functions, and more!
Other
6 stars 1 forks source link

Add function to generate/evaluate code for variables ending in ranges of numbers #79

Open schroeder-matt opened 2 months ago

schroeder-matt commented 2 months ago

One of the few things that's easier to do in SAS than in R is summing across ranges of variables by number. In SAS, sum(of v1-v4) is interpreted as v1 + v2 + v3 + v4 -- but R has nothing similar that I'm aware of. This is unfortunate because so many Census Bureau datasets have this naming scheme, and many times I need to add up several variables at once.

I created the attached function (building on the framework developed by former Research staff Nicole Sullivan), but I store it in multiple repositories. I figured it would be good to have this in councilR so that it lives in only one place and potentially help others.

sumRange.txt

Is this something that would be a good fit for councilR? If so, please feel free to make edits to anything and everything, because I am not an expert in designing functions. I also made this with Census Bureau data in mind, so you may find opportunities to generalize the code for other uses.

eroten commented 2 months ago

Well done with the function! Excellent use of the :: and parameter documentation. I'm pasting here for quicker reference

Do you think across() could work for your needs? It is a newer feature in the tidyverse. I've used it in some code here .

If across doesn't work, can you help me understand how this function is works differently?

#' @title Generate code to sum across multiple cells
#'
#' @param .table character, prefix for table
#' @param .start first cell in range to be summed
#' @param .end last cell in range to be summed
#' @param ...
#' @param .int numeric, interval between cells. Default is `1`
#' @param .width numeric, pad cell numbers with 0s to this length. Default is `1`.
#' @param repeatTimes numeric, number of times to repeat the sequence. Default
#'     is `1`.
#' @param repeatOffset numeric, jump by this number of cells each time the
#'     sequence repeats. Default is `0`.
#'
#' @return
#' @export
#'
#' @examples
#' #### sumRange("B01234e", 2, 6) --> B01234e2 + B01234e3 + B01234e4 + B01234e5 + B01234e6
#' #### sumRange("B01234e", 2, 6, .int=2) --> B01234e2 + B01234e4 + B01234e6
#' #### sumRange("B01234e", 2, 6, .int=2, .width=3) --> B01234e002 + B01234e004 + B01234e006
#' #### sumRange("B01234e", 2, 6, .int=2, .width=3, repeatTimes=2, repeatOffset=10) -->
#' B01234e002 + B01234e004 + B01234e006 + B01234e012 + B01234e014 + B01234e016
#' #### can be used in dplyr::mutate(!!sumRange())

sumRange <- function(.table,
                     .start,
                     .end,
                     ...,
                     .int = 1,
                     .width = 1,
                     repeatTimes = 1,
                     repeatOffset = 0) {

  if (repeatTimes == 1) { # if no repetition is needed, just use simple code
    a <- paste0(rep(.table),
                stringr::str_pad(seq(from = .start,
                                     to = .end,
                                     by = .int), width = .width, pad = "0"),
                collapse = " + ") # and this links each table/cell number combination with a "+"
    rlang::parse_expr(a)
  } else { # otherwise, repeat as many times as requested in argument, putting elements into a list
    a <- purrr::map(1:repeatTimes, ~ paste0(rep(.table),
                                            stringr::str_pad(seq(from = .start + (repeatOffset * (.x - 1)),
                                                                 to = .end + (repeatOffset * (.x - 1)),
                                                                 by = .int), width = .width, pad = "0"),
                                            collapse = " + ") # and this links each table/cell number combination with a "+"
    )
    # then we just have to combine the list elements (one per repetition)
    rlang::parse_expr(paste0(a, collapse = " + "))
  }
}
schroeder-matt commented 2 months ago

Sure! across() is a simpler way to transform/create multiple variables at once; this function is a simpler way to add up multiple variables at once (in order to transform/create a single variable). An example is here.