brad-cannell / freqtables

Quickly make tables of descriptive statistics (i.e., counts, percentages, confidence intervals) for categorical variables. This package is designed to work in a tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.
Other
12 stars 1 forks source link

Creating multiple one or n-way tables #36

Open mbcann01 opened 3 years ago

mbcann01 commented 3 years ago

Currently using purr::map_df. Here is an example from L2C quarterly report:

# Loop over all categorical vars
cat_stats <- purrr::map_df(
  quos(gender_f, race_3cat_f, hispanic_f), 
  function(x) {
    demographics %>%
      filter(screened_in == 1) %>% 
      freq_table({{x}}) %>%
      freq_format(recipe = "n (percent)", digits = 1) %>%
      select(var, cat, formatted_stats) %>%
      # Add a row with the var name only
      add_row(var = quo_name(x), .before = 1) %>% 
      # Add blank row below
      add_row(var = "", cat = "", formatted_stats = "")
  }
)

I should either:

  1. Create a wrapper function to make this easier to read.
  2. Document using purrr::map_df really well.
  3. Both.
mbcann01 commented 2 years ago

Make a wrapper?

freq_tables <- function(.data, ...) {
  purrr::map(
    .x = enquos(...),
    .f = ~ .data %>% freq_table({{ .x }})
  )
}

mtcars %>% 
  group_by(am) %>% 
  freq_tables(cyl, vs)

Returns a list of data frames

mbcann01 commented 2 years ago

Multiple 2-way tables when multiple variables are passed to freq_table()

Previously, passing two variable names to the freq_table() function would create a two-way table. After, issue #40 it still does, but not for a lot longer. After a year or so, we are going to remove ... from freq_table(), which will make that functionality go away. At that point, passing two variables to freq_table() could create multiple two-way tables. Here is some code demonstrating what that might look like.

Cyl is the outcome var of interest

mtcars %>% 
  freq_table(cyl)
  var cat  n n_total percent       se   t_crit      lcl      ucl
1 cyl   4 11      32  34.375 8.530513 2.039513 19.49961 53.11130
2 cyl   6  7      32  21.875 7.424859 2.039513 10.34883 40.44691
3 cyl   8 14      32  43.750 8.909831 2.039513 27.09672 61.94211

Now, cyl within levels of am

mtcars %>% 
  group_by(am) %>%
  freq_table(cyl)
  row_var row_cat col_var col_cat     n n_row n_total percent_total se_total t_crit_total lcl_total ucl_total percent_row se_row
  <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>        <dbl>     <dbl>     <dbl>       <dbl>  <dbl>
1 am      0       cyl     4           3    19      32          9.38     5.24         2.04      2.86      26.7        15.8   8.59
2 am      0       cyl     6           4    19      32         12.5      5.94         2.04      4.51      30.2        21.1   9.61
3 am      0       cyl     8          12    19      32         37.5      8.70         2.04     22.0       56.1        63.2  11.4 
4 am      1       cyl     4           8    13      32         25        7.78         2.04     12.5       43.7        61.5  14.0 
5 am      1       cyl     6           3    13      32          9.38     5.24         2.04      2.86      26.7        23.1  12.2 
6 am      1       cyl     8           2    13      32          6.25     4.35         2.04      1.45      23.2        15.4  10.4 
# … with 3 more variables: t_crit_row <dbl>, lcl_row <dbl>, ucl_row <dbl>

That is the result we want. However, this works too.

mtcars %>% 
  freq_table(am, cyl)
  row_var row_cat col_var col_cat     n n_row n_total percent_total se_total t_crit_total lcl_total ucl_total percent_row se_row
  <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>        <dbl>     <dbl>     <dbl>       <dbl>  <dbl>
1 am      0       cyl     4           3    19      32          9.38     5.24         2.04      2.86      26.7        15.8   8.59
2 am      0       cyl     6           4    19      32         12.5      5.94         2.04      4.51      30.2        21.1   9.61
3 am      0       cyl     8          12    19      32         37.5      8.70         2.04     22.0       56.1        63.2  11.4 
4 am      1       cyl     4           8    13      32         25        7.78         2.04     12.5       43.7        61.5  14.0 
5 am      1       cyl     6           3    13      32          9.38     5.24         2.04      2.86      26.7        23.1  12.2 
6 am      1       cyl     8           2    13      32          6.25     4.35         2.04      1.45      23.2        15.4  10.4 
# … with 3 more variables: t_crit_row <dbl>, lcl_row <dbl>, ucl_row <dbl>

If that didn't what would we want it to return instead? A list of one-way tables?

I think I also want to get rid of some of this output. It's too much.

# Multiple n-way tables
freq_table2 <- function(.data, .freq_var, drop = FALSE) {

  # ===========================================================================
  # Get within group counts
  # .drop = FALSE creates an explicit n = 0 for unobserved factor levels
  # ===========================================================================
  .data <- dplyr::count(.data, {{ .freq_var }}, .drop = drop)
  return(.data)

  # Return tibble of results
  out
}

# For testing
# mtcars %>% 
#   group_by(am) %>% 
#   freq_table2(cyl)

# And if you want more than one table
purrr::map(
  .x = quos(cyl, vs),
  .f = ~ mtcars %>% group_by(am) %>% freq_table2({{ .x }})
)
# Make a wrapper?
freq_tables <- function(.data, ...) {
  dot_syms <- ensyms(...)
  dot_names <- purrr::map(dot_syms, rlang::as_name)
  purrr::map(
    .x = enquos(...),
    .f = ~ .data %>% freq_table2({{ .x }})
  ) %>% 
    rlang::set_names(dot_names)
}

mtcars %>% 
  group_by(am) %>% 
  freq_tables(cyl, vs)

This creates a list of named freq tables.