Open wibeasley opened 1 year ago
@genevamarshall, @yutiantang and others are using sjstats::mean_n()
. It doesn't support nonuniform weights. And (at least currently) uses a slow approach that involves casting the data.frame to a matrix.
I've been working on something that meets all these requirements except for for the nonuniform weights. https://github.com/LiveOak/vasquez-border-reentry-1
row_sum <- function(
d,
columns_to_average = character(0),
pattern,
new_column_name = "row_sum",
threshold_proportion = .75,
verbose = FALSE
) {
if (length(columns_to_average) == 0L) {
columns_to_average <-
d |>
colnames() |>
grep(
x = _,
pattern = pattern,
value = TRUE,
perl = TRUE
)
if (verbose) {
message(
"The following columns will be summed:\n- ",
paste(columns_to_average, collapse = "\n- ")
)
}
}
d |>
dplyr::mutate(
row_sum = # Finding the sum (used by m4)
rowSums(
dplyr::across(!!columns_to_average),
na.rm = TRUE
),
nonmissing_count =
rowSums(
dplyr::across(
!!columns_to_average,
.fns = \(x) { !is.na(x) }
)
),
nonmissing_proportion = nonmissing_count / length(columns_to_average),
{{new_column_name}} :=
dplyr::if_else(
threshold_proportion <= nonmissing_proportion,
row_sum,
# row_sum / nonmissing_count,
NA_real_
)
) |>
dplyr::select(
-row_sum,
-nonmissing_count,
-nonmissing_proportion,
)
# Alternatively, return just the new columns
# dplyr::pull({{new_column_name}})
}
@wibeasley Feature request and questions: FR: Would be nice to have a row_mean function as well, which averages across all nonmissing items. Q1: For row_sum, should 'columns_to_average' argument be 'columns_to_sum' instead? Q2: Can you provide an example of how this function might be used inside a dplyr::mutate statement?
@DavidBard,
inputs: