GEMINI-Medicine / Rgemini

A custom R package that provides a variety of functions to perform data analyses with GEMINI data
https://gemini-medicine.github.io/Rgemini/
Other
3 stars 0 forks source link

[BUG]: max_pairwise_smd does not work with labelled column #85

Closed guoyi-smh closed 3 months ago

guoyi-smh commented 6 months ago

New Feature Request

When using table1() to generate tables, it can be useful to label column names for better display in the output html table. e.g.

# convert gender to binary sex
cohort$sexf <- cohort$gender=="F"
# improve readability in output table
label(cohort$sexf) <- "Sex Female"

Currently the max_pairwise_smd function will give error when the input column is labelled:

Error in (vartype == "numeric") || is.integer(x$value) : 
  'length = 2' in coercion to 'logical(1)'

Because it assumes the input column has only one class while labelled column can have two:

> class(cohort$sexf)
[1] "labelled" "logical" 

This can be improved by something like:

  fn <- if (("numeric" %in% vartype) || is.integer(x$value)) {
    stddiff.numeric
  } else if ("logical" %in% vartype) {
    stddiff.binary
  } else if (any(c("factor", "character") %in% vartype)) {
    stddiff.category
  }
guoyi-smh commented 3 months ago

After some further investigation into this issue, I found that the reason for the error message was that when I ran the example above, the label function call was calling the label function in Hmisc package, not the label() function in table1 package (table1::label() was masked by Hmisc::label), and as a result, the variable would get a labelled class, which caused the error. When table1::label() funtion is used, it adds an attribute to the variable without changing the class.

Other than the modification in class, the Hmisc::label() also works well to add label attribute and it is compatible with table1. Since Hmisc is a a widely used package for EDA and other people might encounter this issue as well, in the fix I simply added one line to remove the labelled class from the input. With this change, even when people uses the wrong label function to add labels table1 still works.