Frequency table multiple binary variables

GabiPedra commented 5 years ago

Hello,

Lets say I have a dataset with the following variables:

Type 1: categorical of two levels, A and B;
Type 2: categorical of three levels, C, D and E;
Var 1: binary;
Var 2: binary;
Var 3: binary;

our_summary<-list("Var1"= ~ qwraps2::n_perc0(Var1 == 1,na_rm = TRUE),
"Var2"= ~ qwraps2::n_perc0(Var2 == 1,na_rm = TRUE),
"Var3"= ~ qwraps2::n_perc0(Var3 == 1,na_rm = TRUE))

all<-summary_table(dplyr::group_by(data,Type1), our_summary)
sev<-summary_table(dplyr::group_by(data,Type2), our_summary)
whole<-cbind(sev,all)
print(whole,rtitle="summary", booktabs = TRUE)

Is there a way of making a frequency table where the proportions of Var1, Var2 and Var3 sum 100% for the whole column?

I can't seem to work that out. I would like to make a table of frequency where I have over 20 binary variables, but I would like the sum of each column (group) to be 100%.

I think that would be a great add tot he function which is already great by the way.

Thank you very much for your help.

All best, Gabi

dewittpe commented 5 years ago

Yes there is away. Consider modifying your data to be in a 'long' format instead of wide format and use the row grouping feature of summary_table to help.

Please provide a reproducible example and outline of the table you are hoping to build so I can answer you specific question. As written I can think of at least two different ways to build the table with slightly different summary statistics and interpretations.

consider using the reprex package to build the reproducible example to post here.

Thanks.

GabiPedra commented 5 years ago

Thank you for your help and introduction to [reprex]. Here below is an example of the data I have, as you can see, the column of my table does not sum 100%. I'm sorry for a ugly table, just used [reprex] for the first time.

library(plyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:plyr':
#> 
#>     arrange, count, desc, failwith, id, mutate, rename, summarise,
#>     summarize
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(qwraps2)
options(qwraps2_markup="markdown")
library(tidyverse)

## Creating dummy data
group_1<-rep(c("a","b"),5)
group_2<-rep(1:2,each=5)
response_1<-c(1,1,0,0,0,0,0,0,1,0)
response_2<-c(0,1,1,1,1,0,0,0,1,0)
response_3<-c(0,1,0,1,1,1,1,1,1,0)
response_4<-c(0,0,0,0,0,0,0,0,1,0)
data<-data.frame(group_1,group_2,response_1,response_2,response_3,response_4)

our_summary1<-list(
  "Responses 1:2"=list("Response 1"=~ qwraps2::n_perc0(response_1 == 1,na_rm = TRUE),
                       "Response 2"=~ qwraps2::n_perc0(response_2 == 1,na_rm = TRUE)),
  "Responses 3:4"=list("Response 3"=~ qwraps2::n_perc0(response_3 == 1,na_rm = TRUE),
                       "Response 4"=~ qwraps2::n_perc0(response_4 == 1,na_rm = TRUE))
)
all<-summary_table(dplyr::group_by(data,group_1), our_summary1)
sev<-summary_table(dplyr::group_by(data,group_2), our_summary1)
whole<-cbind(sev,all)
#whole
print(whole,rtitle="summary", booktabs = TRUE)
#> 
#> 
#> |summary                 |group_2: 1 (N = 5) |group_2: 2 (N = 5) |group_1: a (N = 5) |group_1: b (N = 5) |
#> |:-----------------------|:------------------|:------------------|:------------------|:------------------|
#> |**Responses 1:2**       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |
#> |&nbsp;&nbsp; Response 1 |2 (40)             |1 (20)             |2 (40)             |1 (20)             |
#> |&nbsp;&nbsp; Response 2 |4 (80)             |1 (20)             |3 (60)             |2 (40)             |
#> |**Responses 3:4**       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |
#> |&nbsp;&nbsp; Response 3 |3 (60)             |4 (80)             |3 (60)             |4 (80)             |
#> |&nbsp;&nbsp; Response 4 |0 (0)              |1 (20)             |1 (20)             |0 (0)              |

^{Created on 2019-04-17 by the reprex package (v0.2.1)}

GabiPedra commented 5 years ago

One more question: How can I also change column width using print function in this example?

dewittpe commented 5 years ago

@GabiPedra, I am wondering if the issue with the data itself, or the way you are looking to summarize it. When I look at the the table provided it suggests that Response 1 and Response 2 would have a column percentage of 100%. This suggests that Response 1 or Response 2 is true, but not both, and not neither. In the provided data the other conditions exist. See the small edit I made to the example below.

If I have miss understood the example and data set let me know. Can you specify if you expect the column percentages to sum to 1 over the row groups or for the whole column?


library(magrittr)
library(dplyr)  
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(qwraps2)
options(qwraps2_markup="markdown")

## Creating dummy data
group_1    <- rep(c("a", "b"), 5)
group_2    <- rep(1:2, each=5)
response_1 <- c(1, 1, 0, 0, 0, 0, 0, 0, 1, 0)
response_2 <- c(0, 1, 1, 1, 1, 0, 0, 0, 1, 0)
response_3 <- c(0, 1, 0, 1, 1, 1, 1, 1, 1, 0)
response_4 <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0)

data <- data.frame(group_1, group_2, response_1, response_2, response_3, response_4)

our_summary1 <- list(
  "Responses 1:2" = list(
                         "Response 1 and not Response 2"  = ~ qwraps2::n_perc0(response_1 == 1 & response_2 == 0, na_rm = TRUE), 
                         "Response 2 and not Response 1"  = ~ qwraps2::n_perc0(response_1 == 0 & response_2 == 1, na_rm = TRUE),
                         "!Response 1 and  Response 2"    = ~ qwraps2::n_perc0(response_1 == 1 & response_2 == 1, na_rm = TRUE),
                         "Neither Response 1 nor Response 2"    = ~ qwraps2::n_perc0(response_1 == 0 & response_2 == 0, na_rm = TRUE)
                         ),
  "Responses 3:4" = list("Response 3"=~ qwraps2::n_perc0(response_3 == 1,na_rm = TRUE),
                         "Response 4"=~ qwraps2::n_perc0(response_4 == 1,na_rm = TRUE))
)

all <- summary_table(dplyr::group_by(data, group_1), our_summary1)
sev <- summary_table(dplyr::group_by(data, group_2), our_summary1)
whole <- cbind(sev, all)
print(whole, rtitle="summary", booktabs = TRUE)
#> 
#> 
#> |summary                                        |group_2: 1 (N = 5) |group_2: 2 (N = 5) |group_1: a (N = 5) |group_1: b (N = 5) |
#> |:----------------------------------------------|:------------------|:------------------|:------------------|:------------------|
#> |**Responses 1:2**                              |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |
#> |&nbsp;&nbsp; Response 1 and not Response 2     |1 (20)             |0 (0)              |1 (20)             |0 (0)              |
#> |&nbsp;&nbsp; Response 2 and not Response 1     |3 (60)             |0 (0)              |2 (40)             |1 (20)             |
#> |&nbsp;&nbsp; !Response 1 and  Response 2       |1 (20)             |1 (20)             |1 (20)             |1 (20)             |
#> |&nbsp;&nbsp; Neither Response 1 nor Response 2 |0 (0)              |4 (80)             |1 (20)             |3 (60)             |
#> |**Responses 3:4**                              |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |&nbsp;&nbsp;       |
#> |&nbsp;&nbsp; Response 3                        |3 (60)             |4 (80)             |3 (60)             |4 (80)             |
#> |&nbsp;&nbsp; Response 4                        |0 (0)              |1 (20)             |1 (20)             |0 (0)              |

^{Created on 2019-04-28 by the reprex package (v0.2.1)}

dewittpe / qwraps2

Frequency table multiple binary variables #75