Motivation

Univariable and multivariable regression with hospital-level clustered robust standard errors (SE) are commonly used methods for inference studies. Rms is commonly used for those purposes, so we can build functions based on this package.

Sample codes

Univariable regression

lrf <- function(data, exposure, outcome){
  dd <<- datadist(data) 
  options(datadist='dd') 

  f <- lrm(as.formula( paste0(outcome, ' ~ exposure') ) , data = d, x=T, y=T)

  ### model results w robust SE
  f2 <- robcov(f, cluster=d$hospital) 

  p <- plot(Predict(f2, fun=plogis), ylab = paste0('Probability of ', outcome ) )
  print(p)

  tbl <- data.table(exposure=c('intermediate vs low', 'high vs low'),
                  OR=c(round(exp(f2$coefficients[2]), 2), round(exp(f2$coefficients[3]), 2)),
                  CI_low=c(round(exp(f2$coefficients[2] - 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] - 1.96*sqrt(diag(f2$var))[3]), 2)),
                  CI_high=c(round(exp(f2$coefficients[2] + 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] + 1.96*sqrt(diag(f2$var))[3]), 2))

                  )

  kable(tbl)

}

Multivariable regression

lrf <- function(data, exposure, outcome, confounders){
  dd <<- datadist(data) 
  options(datadist='dd') 

  f <- lrm(as.formula( paste0(outcome, ' ~ exposure + confounders') ) , data = d, x=T, y=T)

  ### model results w robust SE
  f2 <- robcov(f, cluster=d$hospital) 

  p <- plot(Predict(f2, fun=plogis), ylab = paste0('Probability of ', outcome ) )
  print(p)

  tbl <- data.table(exposure=c('intermediate vs low', 'high vs low'),
                  OR=c(round(exp(f2$coefficients[2]), 2), round(exp(f2$coefficients[3]), 2)),
                  CI_low=c(round(exp(f2$coefficients[2] - 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] - 1.96*sqrt(diag(f2$var))[3]), 2)),
                  CI_high=c(round(exp(f2$coefficients[2] + 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] + 1.96*sqrt(diag(f2$var))[3]), 2))

                  )

  kable(tbl)

}

Other thoughts

Need to be careful with documentation, since we don't want univariable regression to be conducted without thinking
For multivariable regression, if it's an inference study, users need to have a clear conceptual model.
For multivariable regression, we can in general assume there are only confounders since those are the most common cases. Users may build their own codes if they require more complexities, eg. interactions.
Can automatically choose common models based on outcomes, eg. negative binomial regression for length of stay, logistic regression for mortality. Will be good if can write 1-2 lines to describe why those models are chosen for certain outcomes.
Good to have 1-2 lines explaining why using clustered robust SE

GEMINI-Medicine / Rgemini

functions to automate univariable and multivariable regression #98

Motivation

Sample codes

Univariable regression

Multivariable regression

Other thoughts