BS1125 / CMAverse

A suite of functions for reproducible causal mediation analyses
https://bs1125.github.io/CMAverse/
45 stars 11 forks source link

How to integrate the effect size when inference = "bootstrap",multimp=TRUE is specified in the cmest function #46

Open Apprentice2 opened 6 months ago

Apprentice2 commented 6 months ago

I am conducting a mediation analysis on data where y is binary, mediator is continuous, and exposure is binary.

Because of missing data, I specify estimation = "imputation", inference = "bootstrap", nboot = 1000, multimp=TRUE, and m=100 in the cmest function. In this case, 100 effect sizes obtained by multiple imputation will be integrated by Rubin's rule? Or are they integrated by some other algorithm? I would be grateful if you could enlighten me.

BS1125 commented 6 months ago

Hi, they are integrated by Rubin's rule.

Apprentice2 commented 6 months ago

Thank you for your prompt response. You answered that the results were merged by Rubin's rules, but my question remains

I have checked the following estinf function script https://github.com/BS1125/CMAverse/blob/master/R/estinf.R

According to this script, Rubin's rule is applied in the case of inference="delta" (lines 332-344 of the script), while inference="bootstrap" seems to have a different process working (lines 311-330 of the script).

Sorry to repeat the question, but are Rubin's rules still applied when inference="boostrap"? If Rubin's rules are applied, why is the process completely different from the case of inference="delta"? Thank you in advance for any clarification you may be able to provide.

bernard-liew commented 2 weeks ago

Hi Apprentice2, I also encountered this question. My inspection of the code suggests this.

######################## Code #######################################################

if (inference == "bootstrap") { # basically create a custom bootstrap function. boot.step <- function(data = NULL, indices = NULL) { data_boot <- data[indices, ] args_mice$data <- data_boot data_imp <- complete(do.call(mice, args_mice), action = "all") # for every bootstrap iteration, impute m times, which in our case, m = 20 curVal <- get("counter", envir = env) assign("counter", curVal + 1, envir = env) setTxtProgressBar(get("progbar", envir = env), curVal + 1) return(colMeans(do.call(rbind, lapply(1:m, function(x) # estimate all the statistical parameters for each m times. Take the average over 20 for each parameter. Return the average value. est.rb(data = data_imp[[x]], outReg = FALSE, full = full))))) } environment(boot.step) <- environment()

bootstrap results

    boots <- boot(data = data, statistic = boot.step, R = nboot)
    # bootstrap CIs
    environment(boot.ci) <- environment()
    effect.ci <- boot.ci(boots = boots)
    effect.ci.low <- effect.ci[, 1]
    effect.ci.high <- effect.ci[, 2]
    # bootstrap p-values
    effect.pval <- sapply(1:n_effect, function(x) boot.pval(boots = boots$t[, x], pe = effect.pe[x]))

My deduction is that for bootstrapping and missing imputation, it does this.

For each bootstrap iteration, a bootstrap resample data is generated. MICE is used to impute 20 (our number of imputation specified) different complete data of this. The parameters are estimated using the regression-based approach for each of the 20 complete datasets. The average values of each parameter are returned. The aforementioned steps are repeated 1000 times (our number of bootstrapped samples). The average and 95% confidence interval of the parameters are then estimated from this 1000 values.

Hope to get confirmation of whether I correctly interpreted this.

Regards, Bernard

Apprentice2 commented 1 week ago

Thank you for your input. I also checked the source code. I interpreted the code as follows: the point estimate is estimated from m imputed data sets; interval estimates and the test statistic are computed by bootstrap estimation consisting of nboot times. The bootstrap estimation does not affect the point estimate. Is my interpretation the same as yours?