Closed karissawhiting closed 2 years ago
@michaelcurry1123 would like to get your input on this
@michaelcurry1123 Here is some code that I've used before, although it could probably be cleaned up:
Additionally, I think there should be options to sort by p_value or alteration frequency. And also by default it should use gtsummary's compact theme.
cutoff_genes <- df %>% select(-subset) %>%
ungroup() %>%
pivot_longer(-sample_id) %>%
# mutate(name = str_remove_all(name, ".Amp|.fus|.Del")) %>%
distinct() %>%
group_by(name) %>%
summarise(
sum = sum(value, na.rm = TRUE),
count = nrow(df) - sum(is.na(value)),
num_na = sum(is.na(value))
) %>%
mutate(perc = sum / count) %>%
filter(perc > cutoff) %>%
pull(name)
# filter only those > cutoff %
df <- df %>%
select(sample_id, subset,
one_of(cutoff_genes))
df %>%
tbl_summary(by = <XX> ) %>%
add_p() %>%
add_overall() %>%
bold_labels() %>%
sort_p()
This is now split off into tbl_genomic() but old versions may need to be cleared out
gen_summary()
andgen_uni_cox()
functionality overlaps with {gtsummary}tbl_summary()
andtbl_uvregression()
functions. Since {gtsummary} is very fully-featured, we may want to consider making these functions wrap existing {gtsummary} functions, or consider removing these functions altogether.One thing to consider is speed since genomic data sets are often very large. We should test {gtsummary} functions with large genomic data sets first to see if it's worth keeping these less fully featured, but faster versions.