TysonStanley / tidyfast

Fast and efficient alternatives to tidyr functions built on data.table #rdatatable #rstats
https://tysonbarrett.com/tidyfast/
187 stars 4 forks source link

Utilize rlang? #23

Closed markfairbanks closed 4 years ago

markfairbanks commented 4 years ago

I've been wondering if we should be utilizing rlang to allow for the user to create user defined functions with tidyfast much like they would with dplyr.

Utilizing nested functions calls with eval()/substitute() seems to be difficult. I couldn't figure out how to recreate this in base R.

Thoughts?

library(data.table)
library(rlang)

test_df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

new_dt_mutate <- function(.data, ..., by = NULL) {
  if (!is.data.frame(.data)) stop(".data must be a data.frame or data.table")
  if (!is.data.table(.data)) .data <- as.data.table(.data)

  dots <- enexprs(...)
  by <- enexpr(by)

  eval_tidy(expr(
    .data[, ':='(!!!dots), !!by][]
  ))
}

add_one <- function(.data, add_col) {
  add_col <- enexpr(add_col)

  .data %>%
    new_dt_mutate(plus_one = !!add_col + 1)
}

add_select <- function(.data, selected_col) {
  selected_col <- enexpr(selected_col)

  .data %>%
    add_one(!!selected_col)
}

test_df %>%
  add_select(x)
markfairbanks commented 4 years ago

A more relevant example I ran into trying to implement this in more functions:

dt_slice <- function(.data, rows = 1:5, by = NULL) {
  if (!is.data.frame(.data)) stop(".data must be a data.frame or data.table")
  if (!is.data.table(.data)) .data <- as.data.table(.data)

  if (!is.numeric(rows)) stop("rows must be a numeric vector")

  rows <- enexpr(rows)
  by <- enexpr(by)

  eval_tidy(expr(
    .data[, .SD[!!rows], !!by]
  ))
}

dt_arrange <- function(.data, ...) {
  if (!is.data.frame(.data)) stop(".data must be a data.frame or data.table")
  if (!is.data.table(.data)) .data <- as.data.table(.data)

  dots <- enexprs(...)

  eval_tidy(expr(
    .data[order(!!!dots)]
  ))
}

dt_top_n <- function(.data, n = 5, wt = NULL, by = NULL) {
  if (!is.data.frame(.data)) stop(".data must be a data.frame or data.table")
  if (!is.data.table(.data)) .data <- as.data.table(.data)

  if (!is.numeric(n) | length(n) > 1) stop("n must be a single number")

  wt <- enexpr(wt)
  by <- enexpr(by)

  if (is.null(wt)) {
    .data %>%
      dt_slice(1:n, !!by)
  } else {
    .data %>%
      dt_arrange(-!!wt) %>%
      dt_slice(1:n, !!by)
  }
}

Using dt_slice() in dt_top_n() was doable in base R. But after that it would make it impossible to use dt_top_n() in a user defined function since the substitute()/eval() workflow would cause early evaluation of user parameters before they went properly through dt_slice()