dewittpe / qwraps2

An updated version of qwraps with a focus on flexibility and general purpose. These functions are helpful for extracting and formatting results from R into .Rnw or .Rmd files. Additional functions for routine work such as extracting results from regression models or finding sensitivity and specificity.
http://www.peteredewitt.com/qwraps2/
GNU General Public License v3.0
37 stars 7 forks source link

summary_table multiple grouping #71

Closed GabiPedra closed 4 years ago

GabiPedra commented 5 years ago

Hello,

I was using a function to produce a table based on grouping my data. I wanted to presented a summary table by Country / severity of a disease. A couple of months ago, it was working fine and I could produce a html report with my tables. However, last week when I tried to run my code again, it is not working. If I want to group by Country or Severity separately, the function works fine. I am sorry if might be an easier solution.

data.cons<-select(data.pat,starts_with("C16"))
data.cons<-data.frame(Country=data.pat$Country.name,data.cons)
data.cons.join<-data.cons      #rename
data.cons.join$severity<-as.factor(data.pat$severity)
our_summary2 <-
       list("haemo scheduled visits" = c("min"  = ~ min(.data$C16a_1),
                                            "max"  = ~ max(.data$C16a_1),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16a_1),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16a_1,na_rm = TRUE))),
            "haemo unscheduled visits" = c("min"  = ~ min(.data$C16a_2),
                                            "max"  = ~ max(.data$C16a_2),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16a_2),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16a_2,na_rm = TRUE))),
            "nurse specialist scheduled visits" = c("min"  = ~ min(.data$C16b_1),
                                            "max"  = ~ max(.data$C16b_1),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16b_1),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16b_1,na_rm = TRUE))),
            "nurse specialist unscheduled visits" = c("min"  = ~ min(.data$C16b_2),
                                            "max"  = ~ max(.data$C16b_2),
                                            "mean &plusmn; sd" = ~ qwraps2::mean_sd(.data$C16b_2),
                                            "mean (95% CI)" = ~ frmtci(qwraps2::mean_ci(.data$C16b_2,na_rm = TRUE)))
  )
all<-summary_table(dplyr::group_by(data.cons.join,Country,severity), our_summary2)

The error I've got is: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

I think is an issue with 'summary_table' which might not recognise the updated version of dplyr. Does anyone know how to fix it?

Thank you, Gabi

dewittpe commented 5 years ago

A work around, at this time, would be use interaction within the group_by call. The summary_table table code assumes that there is only one variable being used for the grouping. Here is an example:

library(dplyr)
library(qwraps2)
options(qwraps2_markup = "markdown")

our_summary <-
  list(
       "mpg" = list( "min" = ~ min(.data$mpg),
                    "mean (95% CI)" = ~ frmtci(mean_ci(.data$mpg))),
       "hp" = list("min" = ~ min(.data$hp),
                   "max" = ~ max(.data$hp),
                   "mean (95% CI)" = ~ frmtci(mean_ci(.data$hp)))
  )

by_color <- summary_table(dplyr::group_by(mtcars, cyl), summaries = our_summary)
by_cut   <- summary_table(dplyr::group_by(mtcars, am), summaries = our_summary)
summary_table(dplyr::group_by(mtcars, interaction(cyl, am)), summaries = our_summary)
interaction(cyl, am): 4.0 (N = 3) interaction(cyl, am): 6.0 (N = 4) interaction(cyl, am): 8.0 (N = 12) interaction(cyl, am): 4.1 (N = 8) interaction(cyl, am): 6.1 (N = 3) interaction(cyl, am): 8.1 (N = 2)
mpg                  
   min 21.5 17.8 10.4 21.4 19.7 15.0
   mean (95% CI) 22.90 (21.26, 24.54) 19.12 (17.53, 20.72) 15.05 (13.48, 16.62) 28.07 (24.97, 31.18) 20.57 (19.72, 21.42) 15.40 (14.62, 16.18)
hp                  
   min 62 105 150 52 110 264
   max 97 123 245 113 175 335
   mean (95% CI) 84.67 (62.42, 106.91) 115.25 (106.25, 124.25) 194.17 (175.29, 213.04) 81.88 (66.18, 97.57) 131.67 (89.20, 174.13) 299.50 (229.92, 369.08)

To help with the default output it might be helpful to define a variable in the data.frame for the grouping. This is not dissimilar to how group_by in dplyr v0.8.0 works.

mtcars$Engine <-
  case_when(
            mtcars$cyl == 4 & mtcars$am == 0 ~ "Four Cylinders, Manual",
            mtcars$cyl == 6 & mtcars$am == 0 ~ "Six Cylinders, Manual",
            mtcars$cyl == 8 & mtcars$am == 0 ~ "Eight Cylinders, Manual",
            mtcars$cyl == 4 & mtcars$am == 1 ~ "Four Cylinders, Automatic",
            mtcars$cyl == 6 & mtcars$am == 1 ~ "Six Cylinders, Automatic",
            mtcars$cyl == 8 & mtcars$am == 1 ~ "Eight Cylinders, Automatic")
summary_table(group_by(mtcars, Engine), our_summary)
Engine: Eight Cylinders, Automatic (N = 2) Engine: Eight Cylinders, Manual (N = 12) Engine: Four Cylinders, Automatic (N = 8) Engine: Four Cylinders, Manual (N = 3) Engine: Six Cylinders, Automatic (N = 3) Engine: Six Cylinders, Manual (N = 4)
mpg                  
   min 15.0 10.4 21.4 21.5 19.7 17.8
   mean (95% CI) 15.40 (14.62, 16.18) 15.05 (13.48, 16.62) 28.07 (24.97, 31.18) 22.90 (21.26, 24.54) 20.57 (19.72, 21.42) 19.12 (17.53, 20.72)
hp                  
   min 264 150 52 62 110 105
   max 335 245 113 97 175 123
   mean (95% CI) 299.50 (229.92, 369.08) 194.17 (175.29, 213.04) 81.88 (66.18, 97.57) 84.67 (62.42, 106.91) 131.67 (89.20, 174.13) 115.25 (106.25, 124.25)

Adding a feature to build the interaction variable automatically might be something I look into doing. I'll keep this issue open as a reminder.

GabiPedra commented 5 years ago

Hello,

Thank you very much for that. That was a good idea. I actually changed your function last night to work around that. But is not fully validated yet. I got your function from R folder and made a slight change. I created a variable to remove the columns related to my groups in your list (lapply). Look for nvar.groups and lapply(function(y) [(y, nvar.groups, )). But again, I am not sure how this is applicable to other conditions of your function. For me, worked so far.

summary_table.grouped_df <- function(x, summaries = qsummary(x)) {

  # A workaround needs to be made while dplyr transition form version 0.7.8 to
  # 0.8.0, see issue #67

  if (!is.null(attr(x, "vars"))) {
    ngrps <- length(attr(x, "vars"))  # for dplyr version 0.7.8
    lbs <- attr(x, "labels")
    grpsz <- frmt(attr(x, "group_sizes"))
    lbs <- apply(cbind(matrix(paste(rep(names(lbs), each = nrow(lbs)), as.matrix(lbs), sep= ": "), nrow = nrow(lbs)), paste0("(N = ", grpsz, ")")), 1, paste, collapse = " ")
  } else {
    ngrps <- nrow(attr(x, "groups"))  # for dplyr version 0.7.99.9000 and beyond
    nvar.groups<--(1:(ncol(attr(x, "groups"))-1)) #correction for number of groups
    lbs <- attr(x, "groups")
    lbs <- lbs[-length(lbs)]
    grpsz <- frmt(sapply(attr(x, "groups")[[".rows"]], length))
    lbs <- apply(cbind(matrix(paste(rep(names(lbs), each = nrow(lbs)), as.matrix(lbs), sep= ": "), nrow = nrow(lbs)), paste0("(N = ", grpsz, ")")), 1, paste, collapse = " ")
  }

  out <-
    lapply(summaries, function(s) { lapply(s, function(y) { rlang::f_rhs(y) }) }) %>%
    lapply(function(dots) { dplyr::summarize(x, !!!(dots)) }) %>%
    lapply(t) %>%
    lapply(function(y) `[`(y, nvar.groups, )) %>%
    do.call(rbind, .)
  #browser()
  colnames(out) <- lbs
  rownames(out) <- unlist(lapply(summaries, names), use.names = FALSE)

  attr(out, "rgroups") <- sapply(summaries, length)
  class(out) <- c("qwraps2_summary_table", class(out))

  out
}

I will try your suggestion too.

All best, Gabi

dewittpe commented 4 years ago

Version 0.5.0 of qwraps2, which will be published soon, refactored how summary_table was implmented. The changes allow for grouping by multiple variables and does not require use of dplyr.

NOTE use of the data pronoun .data in the summary is no longer needed or recommended.

library(qwraps2)
options(qwraps2_markup = "markdown")
packageVersion("qwraps2")
#> [1] '0.4.2.9005'

our_summary <-
  list(
       "mpg" = list( "min" = ~ min(mpg),
                    "mean (95% CI)" = ~ frmtci(mean_ci(mpg))),
       "hp" = list("min" = ~ min(hp),
                   "max" = ~ max(hp),
                   "mean (95% CI)" = ~ frmtci(mean_ci(hp)))
  )

Using the refactored summary_table

by_cyl    <- summary_table(mtcars, our_summary, by = "cyl")
by_am     <- summary_table(mtcars, our_summary, by = "am")
by_cyl_am <- summary_table(mtcars, our_summary, by = c("cyl", "am"))

by_cyl
4 (N = 11) 6 (N = 7) 8 (N = 14)
mpg         
   min 21.4 17.8 10.4
   mean (95% CI) 26.66 (24.00, 29.33) 19.74 (18.67, 20.82) 15.10 (13.76, 16.44)
hp         
   min 52 105 150
   max 113 175 335
   mean (95% CI) 82.64 (70.27, 95.01) 122.29 (104.31, 140.26) 209.21 (182.51, 235.92)
by_am
0 (N = 19) 1 (N = 13)
mpg      
   min 10.4 15
   mean (95% CI) 17.15 (15.42, 18.87) 24.39 (21.04, 27.74)
hp      
   min 62 52
   max 245 335
   mean (95% CI) 160.26 (136.02, 184.50) 126.85 (81.15, 172.54)
by_cyl_am
4.0 (N = 3) 6.0 (N = 4) 8.0 (N = 12) 4.1 (N = 8) 6.1 (N = 3) 8.1 (N = 2)
mpg                  
   min 21.5 17.8 10.4 21.4 19.7 15
   mean (95% CI) 22.90 (21.26, 24.54) 19.12 (17.53, 20.72) 15.05 (13.48, 16.62) 28.07 (24.97, 31.18) 20.57 (19.72, 21.42) 15.40 (14.62, 16.18)
hp                  
   min 62 105 150 52 110 264
   max 97 123 245 113 175 335
   mean (95% CI) 84.67 (62.42, 106.91) 115.25 (106.25, 124.25) 194.17 (175.29, 213.04) 81.88 (66.18, 97.57) 131.67 (89.20, 174.13) 299.50 (229.92, 369.08)

Producing the same tables using dplyr::group_by

grpby_cyl    <- summary_table(dplyr::group_by(mtcars, cyl),     our_summary)
grpby_am     <- summary_table(dplyr::group_by(mtcars, am),      our_summary)
grpby_cyl_am <- summary_table(dplyr::group_by(mtcars, cyl, am), our_summary)

grpby_cyl
4 (N = 11) 6 (N = 7) 8 (N = 14)
mpg         
   min 21.4 17.8 10.4
   mean (95% CI) 26.66 (24.00, 29.33) 19.74 (18.67, 20.82) 15.10 (13.76, 16.44)
hp         
   min 52 105 150
   max 113 175 335
   mean (95% CI) 82.64 (70.27, 95.01) 122.29 (104.31, 140.26) 209.21 (182.51, 235.92)
grpby_am
0 (N = 19) 1 (N = 13)
mpg      
   min 10.4 15
   mean (95% CI) 17.15 (15.42, 18.87) 24.39 (21.04, 27.74)
hp      
   min 62 52
   max 245 335
   mean (95% CI) 160.26 (136.02, 184.50) 126.85 (81.15, 172.54)
grpby_cyl_am
4.0 (N = 3) 6.0 (N = 4) 8.0 (N = 12) 4.1 (N = 8) 6.1 (N = 3) 8.1 (N = 2)
mpg                  
   min 21.5 17.8 10.4 21.4 19.7 15
   mean (95% CI) 22.90 (21.26, 24.54) 19.12 (17.53, 20.72) 15.05 (13.48, 16.62) 28.07 (24.97, 31.18) 20.57 (19.72, 21.42) 15.40 (14.62, 16.18)
hp                  
   min 62 105 150 52 110 264
   max 97 123 245 113 175 335
   mean (95% CI) 84.67 (62.42, 106.91) 115.25 (106.25, 124.25) 194.17 (175.29, 213.04) 81.88 (66.18, 97.57) 131.67 (89.20, 174.13) 299.50 (229.92, 369.08)

Created on 2020-08-24 by the reprex package (v0.3.0)