atorus-research / Tplyr

https://atorus-research.github.io/Tplyr/
Other
95 stars 16 forks source link

denominator of subgroup #136

Closed haodafa2019 closed 11 months ago

haodafa2019 commented 11 months ago

Prerequisites

If one option can be added to select the denominator for subgroup summary

Description

In Tplyr 1.1.0, for ae soc/pt table by sex, the denominator is the number of treatment group but not the the number of sex under treatment group. However the denominator was the number of sex group under treatment group in Tplyr previous package. I wonder if one option can be added so we can select the denominator .

Steps to Reproduce (Bug Report Only)

library(tidyCDISC) library(Tplyr) library(dplyr) adsl <- tidyCDISC::adsl %>% filter(SAFFL=="Y") %>% select(USUBJID,SAFFL,TRT01AN,TRT01A,SEX)

adae <- tidyCDISC::adae %>% filter(SAFFL=="Y" & TRTEMFL=="Y") %>% select(USUBJID,AEDECOD,AEBODSYS,TRTAN,SEX)

table <- tplyr_table(adae,TRTAN, cols = SEX) %>% set_pop_data(adsl) %>% set_pop_treat_var(TRT01AN) %>% add_layer( group_count("Subjects with at least one event") %>% set_distinct_by(USUBJID) %>% set_format_strings(f_str("xxx (xx.x)", distinct_n, distinct_pct)) ) %>% add_layer( group_count(vars(AEBODSYS, AEDECOD)) %>% set_nest_count(TRUE) %>% set_indentation(" ") %>% set_distinct_by(USUBJID) %>% set_outer_sort_position("asc") %>% set_format_strings(f_str("xxx (xx.x)", distinct_n, distinct_pct)) ) %>% build() *Expected behavior: I expect the denominator is 53 for Female subgroup under Placebo group Actual behavior: If we run the code under Tplyr 1.1.0, the denominator is 86 in Placebo group for both Female and Male group.

Example of Table (Feature Request Only)

save the code above.

Versions

Tplyr 1.1.0 You can get this information from executing sessionInfo().

mstackhouse commented 11 months ago

Hi @haodafa2019,

Do you know what previous version you were using prior to v1.1.0? You can achieve what you're looking for by using the function set_denoms_by(). You can also read more about denominator customization in this vignette.

I'm going to close this issue, but please comment if these references don't resolve your issues!

haodafa2019 commented 11 months ago

Hi @mstackhouse ,

The previous version is 0.4.4. And the result are what I want.

And I also tried to use set_denoms_by (SEX) in Tplyr 1.1.0, the denominator is even stranger. the denominator is 86 (Placebo group) for all the 6 subgroups. Below are the code I used,

table <- tplyr_table(adae,TRTAN, cols = SEX) %>% set_pop_data(adsl) %>% set_pop_treat_var(TRT01AN) %>% add_layer( group_count("Subjects with at least one event") %>% set_distinct_by(USUBJID) %>% set_denoms_by(SEX) %>% set_format_strings(f_str("xxx (xx.x)", distinct_n, distinct_pct)) ) %>% add_layer( group_count(vars(AEBODSYS, AEDECOD)) %>% set_nest_count(TRUE) %>% set_indentation(" ") %>% set_denoms_by(SEX) %>% set_distinct_by(USUBJID) %>% set_outer_sort_position("asc") %>% set_format_strings(f_str("xxx (xx.x)", distinct_n, distinct_pct)) ) %>% build()

mstackhouse commented 11 months ago

@haodafa2019 So digging in further I see where the bug is. I'll tag the issue when we have a fix ready.

mstackhouse commented 11 months ago

@haodafa2019 I've pushed an update here - if you'd like to test it out you can install it using

devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="gh_issues_136_138")

Note that this isn't a fully committed change yet, and the update shouldn't be used in any production code until it's made its way into the main branch and accepted back to CRAN

haodafa2019 commented 11 months ago

@mstackhouse , thanks. I installed the package and rerun the code above, the result still not correct. It seems that denominator of all F subgroups are the number of Female in Placebo group (53) and the denominator of all M subgroup are the number of Male in Placebo group (33). I pasted the 1st row result below. The result I marked in bold are not correct.

row_label1 var1_0_F _var1_54_F var1_81F var1_0_M _var1_54_M var1_81M

Subjects with at least one event 40 (75.5) 44 (83.0) 36 (67.9) 25 (75.8) 33 (100.0) 40 (121.2)

And I paste the result I expect by using set_denoms_by(SEX) %>% 40 (75.5) 44 (88.0) 36 (90.0) 25 (75.8) 33 (97.1) 40 (90.1)

mstackhouse commented 11 months ago

Ok @haodafa2019 - I just made a new push. Here's what I'm getting now:

adsl <- tidyCDISC::adsl %>% filter(SAFFL=="Y") %>%
  select(USUBJID,SAFFL,TRT01AN,TRT01A,SEX)

adae <- tidyCDISC::adae %>% filter(SAFFL=="Y" & TRTEMFL=="Y") %>%
   select(USUBJID,AEDECOD,AEBODSYS,TRTAN,SEX)

 tplyr_table(adae,TRTAN, cols=SEX) %>%
   set_pop_data(adsl) %>%
   set_pop_treat_var(TRT01AN) %>%
   add_layer(
     group_count("Subjects with at least one event") %>%
        set_distinct_by(USUBJID) %>%
        set_format_strings(f_str("xxx (xx.x) [xx]", distinct_n, distinct_pct, distinct_total)) 
   ) %>% 
  build() %>% 
  select(-starts_with('ord'))
# A tibble: 1 × 7
  row_label1                       var1_0_F          var1_0_M          var1_54_F         var1_54_M         var1_81_F         var1_81_M        
  <chr>                            <chr>             <chr>             <chr>             <chr>             <chr>             <chr>            
1 Subjects with at least one event " 40 (75.5) [53]" " 25 (75.8) [33]" " 44 (88.0) [50]" " 33 (97.1) [34]" " 36 (90.0) [40]" " 40 (90.9) [44]"

Let me know if this is now working as expected. Note - specifying set_denoms_by(SEX) and not specifying at all should produce the same result, because like 0.4.4 the cols variables in tplyr_table() should pass through to the denoms_by value.

haodafa2019 commented 11 months ago

@mstackhouse - thanks, I downloaded the new push and can produce both the result I want by adding option set_denoms_by(TRTAN) .

I tried the code below, table <- tplyr_table(adae,TRTAN, cols = SEX) %>% set_pop_data(adsl) %>% set_pop_treat_var(TRT01AN) %>% add_layer( group_count("Subjects with at least one event") %>% set_distinct_by(USUBJID) %>% set_denoms_by(TRTAN) %>% set_format_strings(f_str("xxx (xx.x) [xx]", distinct_n, distinct_pct,distinct_total)) ) %>% add_layer( group_count(vars(AEBODSYS, AEDECOD)) %>% set_nest_count(TRUE) %>% set_indentation(" ") %>% set_distinct_by(USUBJID) %>% set_denoms_by(TRTAN) %>% set_outer_sort_position("asc") %>% set_format_strings(f_str("xxx (xx.x) [xx]", distinct_n, distinct_pct,distinct_total)) ) %>% build() %>% select(-starts_with('ord'))

And below is the result

A tibble: 6 × 7 row_label1 var1_0_F var1_0_M var1_54_F var1_54_M var1_81_F var1_81_M

1 "Subjects with at least one event" " 40 (46.5) [86]" " 25 (29.1) [86]" " 44 (52.4) [84]" " 33 (39.3) [84]" " 36 (42.9) [84]" " 40 (47.6) [84]" And I also tried removing the **set_denoms_by(TRTAN) %>%** and the result as below A tibble: 6 × 7 row_label1 var1_0_F var1_0_M var1_54_F var1_54_M var1_81_F var1_81_M 1 "Subjects with at least one event" " 40 (75.5) [53]" " 25 (75.8) [33]" " 44 (88.0) [50]" " 33 (97.1) [34]" " 36 (90.0) [40]" " 40 (90.9) [44]" I think you can close the issue, thanks!
mstackhouse commented 11 months ago

@haodafa2019 awesome! Thanks for reporting the issue and working with me to get this resolved!