GEMINI-Medicine / Rgemini

A custom R package that provides a variety of functions to perform data analyses with GEMINI data
https://gemini-medicine.github.io/Rgemini/
Other
3 stars 0 forks source link

Fix bug in factor level #23 #50

Closed gemini-wenb closed 9 months ago

gemini-wenb commented 9 months ago

closes #23

This bug is due to singularity arises from stddiff::stddiff.category() (a function called by Rgemini::max_pairwise_smd_fix()). When looping through each pair, stddiff::stddiff.category() calculates SMD. However, if a particular pair doesn't have values for all levels of the factor, the empty level will result in a singular matrix.

To fix this, droplevels() is applied to the paired data.

e.g. For pair A-C, the factor variable "category" contains an empty level "o", resulting in singularity. For other pairs, such as A-B, "category" has value at all levels and thus will run as expected. image

Below code demonstrates the error and the fix:

max_pairwise_smd_fix <- function(x, name, round_to = 3, ...) {
  x[["overall"]] <- NULL # remove overall category if exists
  x <- reshape2::melt(x)
  x$L1 <- as.numeric(as.factor(x$L1)) - 1 # needs to start at 0 for stddiff
  pairs <- unique(x$L1) %>% combn(2, simplify = FALSE)
  vartype <- class(x$value)
  fn <- if ((vartype == "numeric") || is.integer(x$value)) {
    stddiff::stddiff.numeric
  } else if (vartype == "logical") {
    stddiff::stddiff.binary
  } else if (vartype %in% c("factor", "character")) {
    stddiff::stddiff.category
  }
  max_smd <- 0
  for (pair in pairs) {
    current_smd <- max(
      fn(x %>% dplyr::filter(L1 %in% pair) %>% droplevels() , # drop factor levels, otherwise singularity may arise for group(s) containing an empty level
         2, 1) %>%  .[[1, "stddiff"]] # alternate reference group through every group 
    )
    if (is.na(current_smd)) {
      warning("Some pairwise SMDs could not be calculated. Please investigate.", .call = FALSE)
      max_smd <- current_smd

    } else if ((current_smd > max_smd) || is.na(max_smd)) {
      max_smd <- current_smd
    }
  }
  return(round(max_smd, round_to))
}

######################### Fake data (same data as #23 )
g <- rep(LETTERS[1:3], each = 30)
x <- c(rep('f', 20), rep('m', 10), 
       rep('f', 16), rep('m', 12), rep('o', 1), rep('x', 1),
       rep('f', 10), rep('m', 19), rep('x', 1) )
df <- data.table(cat = x, hos = g)

strata <- c(list(Total=df), split(df, df$hos))

labels <- list(
  variables=list(cat='category')
)

############################# Returns error due to singularity
table1(strata, labels, groupspan=c(1, 1, 2), extra.col = list("SMD" = Rgemini::max_pairwise_smd))

image

#############################  Returns table1 as expected with the fix
table1(strata, labels, groupspan=c(1, 1, 2), extra.col = list("SMD" = max_pairwise_smd_fix))

image