atorus-research / Tplyr

https://atorus-research.github.io/Tplyr/
Other
95 stars 17 forks source link

Have separate columns for count and descriptive data #49

Open kcaashish opened 2 years ago

kcaashish commented 2 years ago

Description

Is there a way we can have separate columns for the categories of count data and the statistics of desc data when used together? Right now, both the categories ans statistics get populated in the same column.

Looking for something like the table attached here:

Example of Table

image

elimillera commented 2 years ago

@kcaashish Thanks for the request. in the meantime you can make each table then run mutate(Statistic = "N (%)") on the count tables then rbind them together.

@mstackhouse I'm thinking this could look like an option on count tables. something like:

group_count(SEX) %>%
  include_str(TRUE, col = "Statistics")

but I'm not sure how consistent that is with the rest of the package.

mstackhouse commented 2 years ago

@elimillera I think the capability here that's missing would be around where row labels are inserted. For example, I can insert the blank column just with a blank text by variable:

tplyr_table(adsl, TRT01P) %>% 
  add_layer(
    group_desc(AGE, by = vars("Age (years)", ""))
  ) %>% 
  add_layer(
    group_count(SEX, by = vars("Sex", "N (%)"))
  ) %>% 
  build() %>% 
  select(-starts_with('ord'))
# A tibble: 8 × 6
  row_label1  row_label2 row_label3 var1_Placebo   `var1_Xanomeline High Dose` `var1_Xanomeline Low Dose`
  <chr>       <chr>      <chr>      <chr>          <chr>                       <chr>                     
1 Age (years) ""         n          " 86"          " 84"                       " 84"                     
2 Age (years) ""         Mean (SD)  "75.2 ( 8.59)" "74.4 ( 7.89)"              "75.7 ( 8.29)"            
3 Age (years) ""         Median     "76.0"         "76.0"                      "77.5"                    
4 Age (years) ""         Q1, Q3     "69.2, 81.8"   "70.8, 80.0"                "71.0, 82.0"              
5 Age (years) ""         Min, Max   "52, 89"       "56, 88"                    "51, 88"                  
6 Age (years) ""         Missing    "  0"          "  0"                       "  0"                     
7 Sex         "N (%)"    F          "53 ( 61.6%)"  "40 ( 47.6%)"               "50 ( 59.5%)"             
8 Sex         "N (%)"    M          "33 ( 38.4%)"  "44 ( 52.4%)"               "34 ( 40.5%)"    

But Tplyr makes an assumption that the last row label will be the statistic or the categorical result. So maybe we want to be able to specify where by variables are inserted?

The table above itself is doable by building two separate Tplyr tables, renaming columns, and stacking. But that's not really ideal.