Univariable and multivariable regression with hospital-level clustered robust standard errors (SE) are commonly used methods for inference studies. Rms is commonly used for those purposes, so we can build functions based on this package.
Sample codes
Univariable regression
lrf <- function(data, exposure, outcome){
dd <<- datadist(data)
options(datadist='dd')
f <- lrm(as.formula( paste0(outcome, ' ~ exposure') ) , data = d, x=T, y=T)
### model results w robust SE
f2 <- robcov(f, cluster=d$hospital)
p <- plot(Predict(f2, fun=plogis), ylab = paste0('Probability of ', outcome ) )
print(p)
tbl <- data.table(exposure=c('intermediate vs low', 'high vs low'),
OR=c(round(exp(f2$coefficients[2]), 2), round(exp(f2$coefficients[3]), 2)),
CI_low=c(round(exp(f2$coefficients[2] - 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] - 1.96*sqrt(diag(f2$var))[3]), 2)),
CI_high=c(round(exp(f2$coefficients[2] + 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] + 1.96*sqrt(diag(f2$var))[3]), 2))
)
kable(tbl)
}
Multivariable regression
lrf <- function(data, exposure, outcome, confounders){
dd <<- datadist(data)
options(datadist='dd')
f <- lrm(as.formula( paste0(outcome, ' ~ exposure + confounders') ) , data = d, x=T, y=T)
### model results w robust SE
f2 <- robcov(f, cluster=d$hospital)
p <- plot(Predict(f2, fun=plogis), ylab = paste0('Probability of ', outcome ) )
print(p)
tbl <- data.table(exposure=c('intermediate vs low', 'high vs low'),
OR=c(round(exp(f2$coefficients[2]), 2), round(exp(f2$coefficients[3]), 2)),
CI_low=c(round(exp(f2$coefficients[2] - 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] - 1.96*sqrt(diag(f2$var))[3]), 2)),
CI_high=c(round(exp(f2$coefficients[2] + 1.96*sqrt(diag(f2$var))[2]), 2), round(exp(f2$coefficients[3] + 1.96*sqrt(diag(f2$var))[3]), 2))
)
kable(tbl)
}
Other thoughts
Need to be careful with documentation, since we don't want univariable regression to be conducted without thinking
For multivariable regression, if it's an inference study, users need to have a clear conceptual model.
For multivariable regression, we can in general assume there are only confounders since those are the most common cases. Users may build their own codes if they require more complexities, eg. interactions.
Can automatically choose common models based on outcomes, eg. negative binomial regression for length of stay, logistic regression for mortality. Will be good if can write 1-2 lines to describe why those models are chosen for certain outcomes.
Good to have 1-2 lines explaining why using clustered robust SE
Closing this issue here. As discussed, this is better taken care of with vignettes that provide a step-by-step guide on best practices (see summer housekeeping list).
Motivation
Univariable and multivariable regression with hospital-level clustered robust standard errors (SE) are commonly used methods for inference studies.
Rms
is commonly used for those purposes, so we can build functions based on this package.Sample codes
Univariable regression
Multivariable regression
Other thoughts