MSKCC-Epi-Bio / gnomeR

Package to wrangle and visualize genomic data in R
https://mskcc-epi-bio.github.io/gnomeR/
Other
26 stars 19 forks source link

Create internal function to check/recode alteration column in CNA long data #178

Closed karissawhiting closed 2 years ago

karissawhiting commented 2 years ago

The code below to recode CNA alteration data is repeated in a couple places, and also is in the {oncokbR} package. This could be made into an internal function. Steps include:

We should consider coding to those more descriptive values first, then collapsing to just alteration/deletion for binary matrix. May need to check with cbioportal and esther for final coding and also what -1.5 means which occurs in the data occasionally (@karissawhiting will find example of this).


  # Make sure hugo & alteration is character
  cna <- cna %>%
    mutate(hugo_symbol = as.character(.data$hugo_symbol)) %>%
    mutate(alteration = tolower(str_trim(as.character(.data$alteration))))

  levels_in_data <- names(table(cna$alteration))

  allowed_chr_levels <- c(
    "neutral" = "0",
    "deletion" = "-2",
    "loh" = "-1.5",
    "loh" = "-1",
    "gain" = "1",
    "amplification" = "2"
  )

 all_allowed <- c(allowed_chr_levels, names(allowed_chr_levels))
 not_allowed <- levels_in_data[!levels_in_data %in% all_allowed]

  if(length(not_allowed) > 0) {
    cli::cli_abort(c("Unknown values in {.field alteration} field: {.val {not_allowed}}",
                   "Must be one of the following: {.val {all_allowed}}"))
  }

 suppressWarnings(
   cna <- cna %>%
     mutate(alteration = forcats::fct_recode(.data$alteration, !!!allowed_chr_levels))
 )