MSKCC-Epi-Bio / gnomeR

Package to wrangle and visualize genomic data in R
https://mskcc-epi-bio.github.io/gnomeR/
Other
26 stars 19 forks source link

Review `gen_summary()` and `gen_uni_cox()` and consider switching to use {gtsummary} functions #151

Closed karissawhiting closed 2 years ago

karissawhiting commented 2 years ago

gen_summary() and gen_uni_cox() functionality overlaps with {gtsummary} tbl_summary() and tbl_uvregression() functions. Since {gtsummary} is very fully-featured, we may want to consider making these functions wrap existing {gtsummary} functions, or consider removing these functions altogether.

One thing to consider is speed since genomic data sets are often very large. We should test {gtsummary} functions with large genomic data sets first to see if it's worth keeping these less fully featured, but faster versions.

karissawhiting commented 2 years ago

@michaelcurry1123 would like to get your input on this

karissawhiting commented 2 years ago
karissawhiting commented 2 years ago

@michaelcurry1123 Here is some code that I've used before, although it could probably be cleaned up:

Additionally, I think there should be options to sort by p_value or alteration frequency. And also by default it should use gtsummary's compact theme.

cutoff_genes <- df %>% select(-subset) %>%
  ungroup() %>%
  pivot_longer(-sample_id) %>%
  #   mutate(name = str_remove_all(name, ".Amp|.fus|.Del")) %>%
  distinct() %>%
  group_by(name) %>%
  summarise(
    sum = sum(value, na.rm = TRUE),
    count = nrow(df) - sum(is.na(value)),
    num_na = sum(is.na(value))
  ) %>%
  mutate(perc = sum / count) %>%
  filter(perc > cutoff) %>%
  pull(name)

# filter only those > cutoff %
df <- df %>% 
  select(sample_id, subset,
         one_of(cutoff_genes))

df %>% 
  tbl_summary(by = <XX> ) %>%
  add_p() %>%
  add_overall() %>%
  bold_labels() %>%
  sort_p() 
karissawhiting commented 2 years ago

This is now split off into tbl_genomic() but old versions may need to be cleared out