Closed GabiPedra closed 4 years ago
A work around, at this time, would be use interaction
within the group_by
call. The summary_table
table code assumes that there is only one variable
being used for the grouping. Here is an example:
library(dplyr)
library(qwraps2)
options(qwraps2_markup = "markdown")
our_summary <-
list(
"mpg" = list( "min" = ~ min(.data$mpg),
"mean (95% CI)" = ~ frmtci(mean_ci(.data$mpg))),
"hp" = list("min" = ~ min(.data$hp),
"max" = ~ max(.data$hp),
"mean (95% CI)" = ~ frmtci(mean_ci(.data$hp)))
)
by_color <- summary_table(dplyr::group_by(mtcars, cyl), summaries = our_summary)
by_cut <- summary_table(dplyr::group_by(mtcars, am), summaries = our_summary)
summary_table(dplyr::group_by(mtcars, interaction(cyl, am)), summaries = our_summary)
interaction(cyl, am): 4.0 (N = 3) | interaction(cyl, am): 6.0 (N = 4) | interaction(cyl, am): 8.0 (N = 12) | interaction(cyl, am): 4.1 (N = 8) | interaction(cyl, am): 6.1 (N = 3) | interaction(cyl, am): 8.1 (N = 2) | |
---|---|---|---|---|---|---|
mpg | ||||||
min | 21.5 | 17.8 | 10.4 | 21.4 | 19.7 | 15.0 |
mean (95% CI) | 22.90 (21.26, 24.54) | 19.12 (17.53, 20.72) | 15.05 (13.48, 16.62) | 28.07 (24.97, 31.18) | 20.57 (19.72, 21.42) | 15.40 (14.62, 16.18) |
hp | ||||||
min | 62 | 105 | 150 | 52 | 110 | 264 |
max | 97 | 123 | 245 | 113 | 175 | 335 |
mean (95% CI) | 84.67 (62.42, 106.91) | 115.25 (106.25, 124.25) | 194.17 (175.29, 213.04) | 81.88 (66.18, 97.57) | 131.67 (89.20, 174.13) | 299.50 (229.92, 369.08) |
To help with the default output it might be helpful to define a variable in
the data.frame
for the grouping. This is not dissimilar to how group_by
in dplyr v0.8.0 works.
mtcars$Engine <-
case_when(
mtcars$cyl == 4 & mtcars$am == 0 ~ "Four Cylinders, Manual",
mtcars$cyl == 6 & mtcars$am == 0 ~ "Six Cylinders, Manual",
mtcars$cyl == 8 & mtcars$am == 0 ~ "Eight Cylinders, Manual",
mtcars$cyl == 4 & mtcars$am == 1 ~ "Four Cylinders, Automatic",
mtcars$cyl == 6 & mtcars$am == 1 ~ "Six Cylinders, Automatic",
mtcars$cyl == 8 & mtcars$am == 1 ~ "Eight Cylinders, Automatic")
summary_table(group_by(mtcars, Engine), our_summary)
Engine: Eight Cylinders, Automatic (N = 2) | Engine: Eight Cylinders, Manual (N = 12) | Engine: Four Cylinders, Automatic (N = 8) | Engine: Four Cylinders, Manual (N = 3) | Engine: Six Cylinders, Automatic (N = 3) | Engine: Six Cylinders, Manual (N = 4) | |
---|---|---|---|---|---|---|
mpg | ||||||
min | 15.0 | 10.4 | 21.4 | 21.5 | 19.7 | 17.8 |
mean (95% CI) | 15.40 (14.62, 16.18) | 15.05 (13.48, 16.62) | 28.07 (24.97, 31.18) | 22.90 (21.26, 24.54) | 20.57 (19.72, 21.42) | 19.12 (17.53, 20.72) |
hp | ||||||
min | 264 | 150 | 52 | 62 | 110 | 105 |
max | 335 | 245 | 113 | 97 | 175 | 123 |
mean (95% CI) | 299.50 (229.92, 369.08) | 194.17 (175.29, 213.04) | 81.88 (66.18, 97.57) | 84.67 (62.42, 106.91) | 131.67 (89.20, 174.13) | 115.25 (106.25, 124.25) |
Adding a feature to build the interaction variable automatically might be something I look into doing. I'll keep this issue open as a reminder.
Hello,
Thank you very much for that. That was a good idea. I actually changed your function last night to work around that. But is not fully validated yet. I got your function from R folder and made a slight change. I created a variable to remove the columns related to my groups in your list (lapply). Look for nvar.groups and lapply(function(y) [
(y, nvar.groups, )). But again, I am not sure how this is applicable to other conditions of your function. For me, worked so far.
summary_table.grouped_df <- function(x, summaries = qsummary(x)) {
# A workaround needs to be made while dplyr transition form version 0.7.8 to
# 0.8.0, see issue #67
if (!is.null(attr(x, "vars"))) {
ngrps <- length(attr(x, "vars")) # for dplyr version 0.7.8
lbs <- attr(x, "labels")
grpsz <- frmt(attr(x, "group_sizes"))
lbs <- apply(cbind(matrix(paste(rep(names(lbs), each = nrow(lbs)), as.matrix(lbs), sep= ": "), nrow = nrow(lbs)), paste0("(N = ", grpsz, ")")), 1, paste, collapse = " ")
} else {
ngrps <- nrow(attr(x, "groups")) # for dplyr version 0.7.99.9000 and beyond
nvar.groups<--(1:(ncol(attr(x, "groups"))-1)) #correction for number of groups
lbs <- attr(x, "groups")
lbs <- lbs[-length(lbs)]
grpsz <- frmt(sapply(attr(x, "groups")[[".rows"]], length))
lbs <- apply(cbind(matrix(paste(rep(names(lbs), each = nrow(lbs)), as.matrix(lbs), sep= ": "), nrow = nrow(lbs)), paste0("(N = ", grpsz, ")")), 1, paste, collapse = " ")
}
out <-
lapply(summaries, function(s) { lapply(s, function(y) { rlang::f_rhs(y) }) }) %>%
lapply(function(dots) { dplyr::summarize(x, !!!(dots)) }) %>%
lapply(t) %>%
lapply(function(y) `[`(y, nvar.groups, )) %>%
do.call(rbind, .)
#browser()
colnames(out) <- lbs
rownames(out) <- unlist(lapply(summaries, names), use.names = FALSE)
attr(out, "rgroups") <- sapply(summaries, length)
class(out) <- c("qwraps2_summary_table", class(out))
out
}
I will try your suggestion too.
All best, Gabi
Version 0.5.0 of qwraps2, which will be published soon, refactored how
summary_table
was implmented. The changes allow for grouping by multiple
variables and does not require use of dplyr.
NOTE use of the data pronoun .data
in the summary is no longer needed
or recommended.
library(qwraps2)
options(qwraps2_markup = "markdown")
packageVersion("qwraps2")
#> [1] '0.4.2.9005'
our_summary <-
list(
"mpg" = list( "min" = ~ min(mpg),
"mean (95% CI)" = ~ frmtci(mean_ci(mpg))),
"hp" = list("min" = ~ min(hp),
"max" = ~ max(hp),
"mean (95% CI)" = ~ frmtci(mean_ci(hp)))
)
Using the refactored summary_table
by_cyl <- summary_table(mtcars, our_summary, by = "cyl")
by_am <- summary_table(mtcars, our_summary, by = "am")
by_cyl_am <- summary_table(mtcars, our_summary, by = c("cyl", "am"))
by_cyl
4 (N = 11) | 6 (N = 7) | 8 (N = 14) | |
---|---|---|---|
mpg | |||
min | 21.4 | 17.8 | 10.4 |
mean (95% CI) | 26.66 (24.00, 29.33) | 19.74 (18.67, 20.82) | 15.10 (13.76, 16.44) |
hp | |||
min | 52 | 105 | 150 |
max | 113 | 175 | 335 |
mean (95% CI) | 82.64 (70.27, 95.01) | 122.29 (104.31, 140.26) | 209.21 (182.51, 235.92) |
by_am
0 (N = 19) | 1 (N = 13) | |
---|---|---|
mpg | ||
min | 10.4 | 15 |
mean (95% CI) | 17.15 (15.42, 18.87) | 24.39 (21.04, 27.74) |
hp | ||
min | 62 | 52 |
max | 245 | 335 |
mean (95% CI) | 160.26 (136.02, 184.50) | 126.85 (81.15, 172.54) |
by_cyl_am
4.0 (N = 3) | 6.0 (N = 4) | 8.0 (N = 12) | 4.1 (N = 8) | 6.1 (N = 3) | 8.1 (N = 2) | |
---|---|---|---|---|---|---|
mpg | ||||||
min | 21.5 | 17.8 | 10.4 | 21.4 | 19.7 | 15 |
mean (95% CI) | 22.90 (21.26, 24.54) | 19.12 (17.53, 20.72) | 15.05 (13.48, 16.62) | 28.07 (24.97, 31.18) | 20.57 (19.72, 21.42) | 15.40 (14.62, 16.18) |
hp | ||||||
min | 62 | 105 | 150 | 52 | 110 | 264 |
max | 97 | 123 | 245 | 113 | 175 | 335 |
mean (95% CI) | 84.67 (62.42, 106.91) | 115.25 (106.25, 124.25) | 194.17 (175.29, 213.04) | 81.88 (66.18, 97.57) | 131.67 (89.20, 174.13) | 299.50 (229.92, 369.08) |
Producing the same tables using dplyr::group_by
grpby_cyl <- summary_table(dplyr::group_by(mtcars, cyl), our_summary)
grpby_am <- summary_table(dplyr::group_by(mtcars, am), our_summary)
grpby_cyl_am <- summary_table(dplyr::group_by(mtcars, cyl, am), our_summary)
grpby_cyl
4 (N = 11) | 6 (N = 7) | 8 (N = 14) | |
---|---|---|---|
mpg | |||
min | 21.4 | 17.8 | 10.4 |
mean (95% CI) | 26.66 (24.00, 29.33) | 19.74 (18.67, 20.82) | 15.10 (13.76, 16.44) |
hp | |||
min | 52 | 105 | 150 |
max | 113 | 175 | 335 |
mean (95% CI) | 82.64 (70.27, 95.01) | 122.29 (104.31, 140.26) | 209.21 (182.51, 235.92) |
grpby_am
0 (N = 19) | 1 (N = 13) | |
---|---|---|
mpg | ||
min | 10.4 | 15 |
mean (95% CI) | 17.15 (15.42, 18.87) | 24.39 (21.04, 27.74) |
hp | ||
min | 62 | 52 |
max | 245 | 335 |
mean (95% CI) | 160.26 (136.02, 184.50) | 126.85 (81.15, 172.54) |
grpby_cyl_am
4.0 (N = 3) | 6.0 (N = 4) | 8.0 (N = 12) | 4.1 (N = 8) | 6.1 (N = 3) | 8.1 (N = 2) | |
---|---|---|---|---|---|---|
mpg | ||||||
min | 21.5 | 17.8 | 10.4 | 21.4 | 19.7 | 15 |
mean (95% CI) | 22.90 (21.26, 24.54) | 19.12 (17.53, 20.72) | 15.05 (13.48, 16.62) | 28.07 (24.97, 31.18) | 20.57 (19.72, 21.42) | 15.40 (14.62, 16.18) |
hp | ||||||
min | 62 | 105 | 150 | 52 | 110 | 264 |
max | 97 | 123 | 245 | 113 | 175 | 335 |
mean (95% CI) | 84.67 (62.42, 106.91) | 115.25 (106.25, 124.25) | 194.17 (175.29, 213.04) | 81.88 (66.18, 97.57) | 131.67 (89.20, 174.13) | 299.50 (229.92, 369.08) |
Created on 2020-08-24 by the reprex package (v0.3.0)
Hello,
I was using a function to produce a table based on grouping my data. I wanted to presented a summary table by Country / severity of a disease. A couple of months ago, it was working fine and I could produce a html report with my tables. However, last week when I tried to run my code again, it is not working. If I want to group by Country or Severity separately, the function works fine. I am sorry if might be an easier solution.
The error I've got is: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent
I think is an issue with 'summary_table' which might not recognise the updated version of dplyr. Does anyone know how to fix it?
Thank you, Gabi