The code below to recode CNA alteration data is repeated in a couple places, and also is in the {oncokbR} package. This could be made into an internal function. Steps include:
[x] Clean up/review code and make internal function (Maybe call it .check_alteration_column()or.recode_alteration_column`).
[x] Double check regarding the numeric to character recoding (@edrill mentioned -1 and -1.5 should not be LOH). This source says:
We should consider coding to those more descriptive values first, then collapsing to just alteration/deletion for binary matrix. May need to check with cbioportal and esther for final coding and also what -1.5 means which occurs in the data occasionally (@karissawhiting will find example of this).
[x] Add tests and documentation for this new internal function in {gnomeR}
[x] Update in oncokbR
# Make sure hugo & alteration is character
cna <- cna %>%
mutate(hugo_symbol = as.character(.data$hugo_symbol)) %>%
mutate(alteration = tolower(str_trim(as.character(.data$alteration))))
levels_in_data <- names(table(cna$alteration))
allowed_chr_levels <- c(
"neutral" = "0",
"deletion" = "-2",
"loh" = "-1.5",
"loh" = "-1",
"gain" = "1",
"amplification" = "2"
)
all_allowed <- c(allowed_chr_levels, names(allowed_chr_levels))
not_allowed <- levels_in_data[!levels_in_data %in% all_allowed]
if(length(not_allowed) > 0) {
cli::cli_abort(c("Unknown values in {.field alteration} field: {.val {not_allowed}}",
"Must be one of the following: {.val {all_allowed}}"))
}
suppressWarnings(
cna <- cna %>%
mutate(alteration = forcats::fct_recode(.data$alteration, !!!allowed_chr_levels))
)
The code below to recode CNA alteration data is repeated in a couple places, and also is in the {oncokbR} package. This could be made into an internal function. Steps include:
or
.recode_alteration_column`).We should consider coding to those more descriptive values first, then collapsing to just alteration/deletion for binary matrix. May need to check with cbioportal and esther for final coding and also what -1.5 means which occurs in the data occasionally (@karissawhiting will find example of this).