summary_table multiple grouping

GabiPedra commented 5 years ago

Hello,

I was using a function to produce a table based on grouping my data. I wanted to presented a summary table by Country / severity of a disease. A couple of months ago, it was working fine and I could produce a html report with my tables. However, last week when I tried to run my code again, it is not working. If I want to group by Country or Severity separately, the function works fine. I am sorry if might be an easier solution.

data.cons<-select(data.pat,starts_with("C16"))
data.cons<-data.frame(Country=data.pat$Country.name,data.cons)
data.cons.join<-data.cons      #rename
data.cons.join$severity<-as.factor(data.pat$severity)
our_summary2 <-
       list("haemo scheduled visits" = c("min"  = ~ min(.data$C16a_1),
                                            "max"  = ~ max(.data$C16a_1),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16a_1),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16a_1,na_rm = TRUE))),
            "haemo unscheduled visits" = c("min"  = ~ min(.data$C16a_2),
                                            "max"  = ~ max(.data$C16a_2),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16a_2),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16a_2,na_rm = TRUE))),
            "nurse specialist scheduled visits" = c("min"  = ~ min(.data$C16b_1),
                                            "max"  = ~ max(.data$C16b_1),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16b_1),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16b_1,na_rm = TRUE))),
            "nurse specialist unscheduled visits" = c("min"  = ~ min(.data$C16b_2),
                                            "max"  = ~ max(.data$C16b_2),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16b_2),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16b_2,na_rm = TRUE)))
  )
all<-summary_table(dplyr::group_by(data.cons.join,Country,severity), our_summary2)

The error I've got is: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

I think is an issue with 'summary_table' which might not recognise the updated version of dplyr. Does anyone know how to fix it?

Thank you, Gabi

dewittpe commented 5 years ago

A work around, at this time, would be use interaction within the group_by call. The summary_table table code assumes that there is only one variable being used for the grouping. Here is an example:

library(dplyr)
library(qwraps2)
options(qwraps2_markup = "markdown")

our_summary <-
  list(
       "mpg" = list( "min" = ~ min(.data$mpg),
                    "mean (95% CI)" = ~ frmtci(mean_ci(.data$mpg))),
       "hp" = list("min" = ~ min(.data$hp),
                   "max" = ~ max(.data$hp),
                   "mean (95% CI)" = ~ frmtci(mean_ci(.data$hp)))
  )

by_color <- summary_table(dplyr::group_by(mtcars, cyl), summaries = our_summary)
by_cut   <- summary_table(dplyr::group_by(mtcars, am), summaries = our_summary)

summary_table(dplyr::group_by(mtcars, interaction(cyl, am)), summaries = our_summary)

	interaction(cyl, am): 4.0 (N = 3)	interaction(cyl, am): 6.0 (N = 4)	interaction(cyl, am): 8.0 (N = 12)	interaction(cyl, am): 4.1 (N = 8)	interaction(cyl, am): 6.1 (N = 3)	interaction(cyl, am): 8.1 (N = 2)
mpg
min	21.5	17.8	10.4	21.4	19.7	15.0
mean (95% CI)	22.90 (21.26, 24.54)	19.12 (17.53, 20.72)	15.05 (13.48, 16.62)	28.07 (24.97, 31.18)	20.57 (19.72, 21.42)	15.40 (14.62, 16.18)
hp
min	62	105	150	52	110	264
max	97	123	245	113	175	335
mean (95% CI)	84.67 (62.42, 106.91)	115.25 (106.25, 124.25)	194.17 (175.29, 213.04)	81.88 (66.18, 97.57)	131.67 (89.20, 174.13)	299.50 (229.92, 369.08)

To help with the default output it might be helpful to define a variable in the data.frame for the grouping. This is not dissimilar to how group_by in dplyr v0.8.0 works.

mtcars$Engine <-
  case_when(
            mtcars$cyl == 4 & mtcars$am == 0 ~ "Four Cylinders, Manual",
            mtcars$cyl == 6 & mtcars$am == 0 ~ "Six Cylinders, Manual",
            mtcars$cyl == 8 & mtcars$am == 0 ~ "Eight Cylinders, Manual",
            mtcars$cyl == 4 & mtcars$am == 1 ~ "Four Cylinders, Automatic",
            mtcars$cyl == 6 & mtcars$am == 1 ~ "Six Cylinders, Automatic",
            mtcars$cyl == 8 & mtcars$am == 1 ~ "Eight Cylinders, Automatic")

summary_table(group_by(mtcars, Engine), our_summary)

	Engine: Eight Cylinders, Automatic (N = 2)	Engine: Eight Cylinders, Manual (N = 12)	Engine: Four Cylinders, Automatic (N = 8)	Engine: Four Cylinders, Manual (N = 3)	Engine: Six Cylinders, Automatic (N = 3)	Engine: Six Cylinders, Manual (N = 4)
mpg
min	15.0	10.4	21.4	21.5	19.7	17.8
mean (95% CI)	15.40 (14.62, 16.18)	15.05 (13.48, 16.62)	28.07 (24.97, 31.18)	22.90 (21.26, 24.54)	20.57 (19.72, 21.42)	19.12 (17.53, 20.72)
hp
min	264	150	52	62	110	105
max	335	245	113	97	175	123
mean (95% CI)	299.50 (229.92, 369.08)	194.17 (175.29, 213.04)	81.88 (66.18, 97.57)	84.67 (62.42, 106.91)	131.67 (89.20, 174.13)	115.25 (106.25, 124.25)

Adding a feature to build the interaction variable automatically might be something I look into doing. I'll keep this issue open as a reminder.

GabiPedra commented 5 years ago

Hello,

Thank you very much for that. That was a good idea. I actually changed your function last night to work around that. But is not fully validated yet. I got your function from R folder and made a slight change. I created a variable to remove the columns related to my groups in your list (lapply). Look for nvar.groups and lapply(function(y) [(y, nvar.groups, )). But again, I am not sure how this is applicable to other conditions of your function. For me, worked so far.

summary_table.grouped_df <- function(x, summaries = qsummary(x)) {

  # A workaround needs to be made while dplyr transition form version 0.7.8 to
  # 0.8.0, see issue #67

  if (!is.null(attr(x, "vars"))) {
    ngrps <- length(attr(x, "vars"))  # for dplyr version 0.7.8
    lbs <- attr(x, "labels")
    grpsz <- frmt(attr(x, "group_sizes"))
    lbs <- apply(cbind(matrix(paste(rep(names(lbs), each = nrow(lbs)), as.matrix(lbs), sep= ": "), nrow = nrow(lbs)), paste0("(N = ", grpsz, ")")), 1, paste, collapse = " ")
  } else {
    ngrps <- nrow(attr(x, "groups"))  # for dplyr version 0.7.99.9000 and beyond
    nvar.groups<--(1:(ncol(attr(x, "groups"))-1)) #correction for number of groups
    lbs <- attr(x, "groups")
    lbs <- lbs[-length(lbs)]
    grpsz <- frmt(sapply(attr(x, "groups")[[".rows"]], length))
    lbs <- apply(cbind(matrix(paste(rep(names(lbs), each = nrow(lbs)), as.matrix(lbs), sep= ": "), nrow = nrow(lbs)), paste0("(N = ", grpsz, ")")), 1, paste, collapse = " ")
  }

  out <-
    lapply(summaries, function(s) { lapply(s, function(y) { rlang::f_rhs(y) }) }) %>%
    lapply(function(dots) { dplyr::summarize(x, !!!(dots)) }) %>%
    lapply(t) %>%
    lapply(function(y) `[`(y, nvar.groups, )) %>%
    do.call(rbind, .)
  #browser()
  colnames(out) <- lbs
  rownames(out) <- unlist(lapply(summaries, names), use.names = FALSE)

  attr(out, "rgroups") <- sapply(summaries, length)
  class(out) <- c("qwraps2_summary_table", class(out))

  out
}

I will try your suggestion too.

All best, Gabi

dewittpe commented 4 years ago

Version 0.5.0 of qwraps2, which will be published soon, refactored how summary_table was implmented. The changes allow for grouping by multiple variables and does not require use of dplyr.

NOTE use of the data pronoun .data in the summary is no longer needed or recommended.

library(qwraps2)
options(qwraps2_markup = "markdown")
packageVersion("qwraps2")
#> [1] '0.4.2.9005'

our_summary <-
  list(
       "mpg" = list( "min" = ~ min(mpg),
                    "mean (95% CI)" = ~ frmtci(mean_ci(mpg))),
       "hp" = list("min" = ~ min(hp),
                   "max" = ~ max(hp),
                   "mean (95% CI)" = ~ frmtci(mean_ci(hp)))
  )

Using the refactored summary_table

by_cyl    <- summary_table(mtcars, our_summary, by = "cyl")
by_am     <- summary_table(mtcars, our_summary, by = "am")
by_cyl_am <- summary_table(mtcars, our_summary, by = c("cyl", "am"))

by_cyl

	4 (N = 11)	6 (N = 7)	8 (N = 14)
mpg
min	21.4	17.8	10.4
mean (95% CI)	26.66 (24.00, 29.33)	19.74 (18.67, 20.82)	15.10 (13.76, 16.44)
hp
min	52	105	150
max	113	175	335
mean (95% CI)	82.64 (70.27, 95.01)	122.29 (104.31, 140.26)	209.21 (182.51, 235.92)

by_am

	0 (N = 19)	1 (N = 13)
mpg
min	10.4	15
mean (95% CI)	17.15 (15.42, 18.87)	24.39 (21.04, 27.74)
hp
min	62	52
max	245	335
mean (95% CI)	160.26 (136.02, 184.50)	126.85 (81.15, 172.54)

by_cyl_am

	4.0 (N = 3)	6.0 (N = 4)	8.0 (N = 12)	4.1 (N = 8)	6.1 (N = 3)	8.1 (N = 2)
mpg
min	21.5	17.8	10.4	21.4	19.7	15
mean (95% CI)	22.90 (21.26, 24.54)	19.12 (17.53, 20.72)	15.05 (13.48, 16.62)	28.07 (24.97, 31.18)	20.57 (19.72, 21.42)	15.40 (14.62, 16.18)
hp
min	62	105	150	52	110	264
max	97	123	245	113	175	335
mean (95% CI)	84.67 (62.42, 106.91)	115.25 (106.25, 124.25)	194.17 (175.29, 213.04)	81.88 (66.18, 97.57)	131.67 (89.20, 174.13)	299.50 (229.92, 369.08)

Producing the same tables using dplyr::group_by

grpby_cyl    <- summary_table(dplyr::group_by(mtcars, cyl),     our_summary)
grpby_am     <- summary_table(dplyr::group_by(mtcars, am),      our_summary)
grpby_cyl_am <- summary_table(dplyr::group_by(mtcars, cyl, am), our_summary)

grpby_cyl

	4 (N = 11)	6 (N = 7)	8 (N = 14)
mpg
min	21.4	17.8	10.4
mean (95% CI)	26.66 (24.00, 29.33)	19.74 (18.67, 20.82)	15.10 (13.76, 16.44)
hp
min	52	105	150
max	113	175	335
mean (95% CI)	82.64 (70.27, 95.01)	122.29 (104.31, 140.26)	209.21 (182.51, 235.92)

grpby_am

	0 (N = 19)	1 (N = 13)
mpg
min	10.4	15
mean (95% CI)	17.15 (15.42, 18.87)	24.39 (21.04, 27.74)
hp
min	62	52
max	245	335
mean (95% CI)	160.26 (136.02, 184.50)	126.85 (81.15, 172.54)

grpby_cyl_am

	4.0 (N = 3)	6.0 (N = 4)	8.0 (N = 12)	4.1 (N = 8)	6.1 (N = 3)	8.1 (N = 2)
mpg
min	21.5	17.8	10.4	21.4	19.7	15
mean (95% CI)	22.90 (21.26, 24.54)	19.12 (17.53, 20.72)	15.05 (13.48, 16.62)	28.07 (24.97, 31.18)	20.57 (19.72, 21.42)	15.40 (14.62, 16.18)
hp
min	62	105	150	52	110	264
max	97	123	245	113	175	335
mean (95% CI)	84.67 (62.42, 106.91)	115.25 (106.25, 124.25)	194.17 (175.29, 213.04)	81.88 (66.18, 97.57)	131.67 (89.20, 174.13)	299.50 (229.92, 369.08)

^{Created on 2020-08-24 by the reprex package (v0.3.0)}

dewittpe / qwraps2

summary_table multiple grouping #71