GEMINI-Medicine / Rgemini

A custom R package that provides a variety of functions to perform data analyses with GEMINI data
https://gemini-medicine.github.io/Rgemini/
Other
3 stars 0 forks source link

functions to automate univariable and multivariable regression #98

Closed shijiaSMH closed 2 months ago

shijiaSMH commented 4 months ago

Motivation

Univariable and multivariable regression with hospital-level clustered robust standard errors (SE) are commonly used methods for inference studies. Rms is commonly used for those purposes, so we can build functions based on this package.

Sample codes

Univariable regression

lrf <- function(data, exposure, outcome){
  dd <<- datadist(data) 
  options(datadist='dd') 

  f <- lrm(as.formula( paste0(outcome, ' ~ exposure') ) , data = d, x=T, y=T)

  ### model results w robust SE
  f2 <- robcov(f, cluster=d$hospital) 

  p <- plot(Predict(f2, fun=plogis), ylab = paste0('Probability of ', outcome ) )
  print(p)

  tbl <- data.table(exposure=c('intermediate vs low', 'high vs low'),
                  OR=c(round(exp(f2$coefficients[2]), 2), round(exp(f2$coefficients[3]), 2)),
                  CI_low=c(round(exp(f2$coefficients[2] - 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] - 1.96*sqrt(diag(f2$var))[3]), 2)),
                  CI_high=c(round(exp(f2$coefficients[2] + 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] + 1.96*sqrt(diag(f2$var))[3]), 2))

                  )

  kable(tbl)

}

Multivariable regression

lrf <- function(data, exposure, outcome, confounders){
  dd <<- datadist(data) 
  options(datadist='dd') 

  f <- lrm(as.formula( paste0(outcome, ' ~ exposure + confounders') ) , data = d, x=T, y=T)

  ### model results w robust SE
  f2 <- robcov(f, cluster=d$hospital) 

  p <- plot(Predict(f2, fun=plogis), ylab = paste0('Probability of ', outcome ) )
  print(p)

  tbl <- data.table(exposure=c('intermediate vs low', 'high vs low'),
                  OR=c(round(exp(f2$coefficients[2]), 2), round(exp(f2$coefficients[3]), 2)),
                  CI_low=c(round(exp(f2$coefficients[2] - 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] - 1.96*sqrt(diag(f2$var))[3]), 2)),
                  CI_high=c(round(exp(f2$coefficients[2] + 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] + 1.96*sqrt(diag(f2$var))[3]), 2))

                  )

  kable(tbl)

}

Other thoughts

loffleraSMH commented 2 months ago

Closing this issue here. As discussed, this is better taken care of with vignettes that provide a step-by-step guide on best practices (see summer housekeeping list).