Closed RWParsons closed 1 week ago
Thank you for the reprex and my apologies for the late reply. I looked into it and I think this might be possible to fix but I'm not sure yet.
Here is a modified version of summarize_quantile that works on your example. However I mostly just removed checks and I'm not sure if this will cause problems for other cases.
summarise_quantile <- function(.data, x = NULL, probs, name_suffix = "value") {
checkmate::assertClass(.data, "tbl_sql")
checkmate::assert_double(probs, min.len = 1, lower = 0, upper = 1)
checkmate::assert_character(name_suffix, null.ok = TRUE)
selection_context <- .data$lazy_query$select_operation
# if (!is.null(selection_context) && selection_context == 'summarise') {
# rlang::abort("Cannot estimate quantiles in summarise context.
# Try using `mutate()` function instead of `summarise()`")
# }
vars_context <- NULL
x_context <- NULL
x_arg <- rlang::enexpr(x)
if (!is.null(selection_context)) {
vars_context <- .data$lazy_query$select %>%
dplyr::filter(unlist(purrr::map(.data$expr, rlang::is_quosure)))
if (nrow(vars_context) > 0) {
vars_context <- vars_context %>%
# dplyr::mutate(x_var = purrr::map(purrr::map(.data$expr, rlang::get_expr), ~ if (length(.x) >= 2) {.x[[2]]} else {NULL}))
dplyr::mutate(x_var = purrr::map(.data$expr, ~if(length(rlang::get_expr(.x)) >= 2) {rlang::get_expr(.x)[[2]]} else {NULL}))
x_context <- unique(vars_context$x_var)[[1]]
}
}
# if (!is.null(x_context) && !is.null(x_arg) && x_context != x_arg) {
# msg <- paste0("Confilicting quantile variables: `", x_context, "` (from context) and `", x_arg, "` (passed argument)")
# rlang::abort(msg)
# }
if (is.null(x_context) & is.null(x_arg)) {
msg = "Quantile variable is not specified"
rlang::abort(msg)
}
# x <- x_context %||% x_arg
x <- x_arg
group_by_vars <- .data$lazy_query$group_vars
group_1 <- rlang::syms(c(group_by_vars, x))
funs = list()
if (!is.null(selection_context)) {
funs <- purrr::map(vars_context$name, ~ rlang::expr(max(!!rlang::sym(.x), na.rm = TRUE)))
names(funs) <- vars_context$name
}
group_2 <- rlang::syms(c(group_by_vars, names(funs)))
probs = sort(unique(probs))
quant_expr <- purrr::map(probs, ~ rlang::expr(min(ifelse(accumulated >= !!.x * total, !!x, NA), na.rm = TRUE)))
names(quant_expr) <- paste0('p', as.character(probs * 100), '_', name_suffix)
query <- rlang::expr(
.data %>%
dplyr::group_by(!!!group_1) %>%
dplyr::summarise(..n = dplyr::n(), !!!funs, .groups = "drop") %>%
dplyr::group_by(!!!group_2) %>%
dbplyr::window_order(!!x) %>%
dplyr::mutate(accumulated = cumsum(.data$..n),
total = sum(.data$..n, na.rm = TRUE)) %>%
dplyr::summarize(!!!quant_expr, .groups = "drop")
)
eval(query)
}
Awesome - thanks Adam!
@ablack3 if this is something to prioritise please add in our backlog. Closing this here since @RWParsons seems to be helped by this provided solution. Please reopen if necessary
In my analyses process, I have a couple aggregation steps and want the
summarize_quantile
to be used later on in the pipeline. Currently, I get an error produced when trying to get quantiles:It works if I include a
compute()
after theungroup()
that comes beforehand but I don't want to have to include this step if I can avoid it.Below is a reprex including the code which produces the error and another version with the
compute()
step.