falkcarl / multilevelmediation

5 stars 1 forks source link

L2 Bootstrap issue with data as tibble #30

Open rduesing opened 1 year ago

rduesing commented 1 year ago

Hello!

Many thanks for the package and the simulation article. I tried to estimate simulated data with your package and encountered a strange beahvior, when I stored the data as tibble, as compared to a data frame. The main analyses went fine, but when entered into the L2 bootstrap function, upper and lower CI values are identical (for all estimates), and are quite unreasonable (and the bootstraping ends finishes after a very short time, as if no real bootstrapping has been conducted). When I store the same data as data frame, the bootstrap behaves normally and gives reasonable values.

falkcarl commented 9 months ago

Added this to the FAQ, didn't try to solve it yet.

voidyman commented 6 months ago

Hi, I faced a similar issue with L2 level sampling. Sometimes the CIs are returned as NAs or impossible CIs (both bounds greater than the estimate) when bootstrapping. I wrote a custom wrapper for L2 bootstrapping which resamples at L2 level and calls the inner modmed.mlm function. I am not sure if it is because the estimation returns NA on one of the repetitions / clusters (because this is what caused my wrapper to return NAs during testing).

I am on a Windows machine using RStudio if this information is useful. Also below is the code I used for manually bootstrapping if anyone needs it in the meantime.

parallel_bootstrap<-function(seed,mod.med.data,bootstraps, X,Y,M, moderator,
                             mod.a =TRUE, mod.b =FALSE, mod.cprime = TRUE,
                             random.a = FALSE,random.b = FALSE,random.cprime = FALSE,case = "L2"){

  ngroup <- length(unique(mod.med.data$id))
  unique_ids <- unique(mod.med.data$id)

  # pick ngroup Level 2 units from the sample with replacement
  set.seed(seed)
  sampled_ids <- sample(unique_ids,ngroup, replace = TRUE)
  bootstrapped_df <- data.frame()

  for (i in 1:length(sampled_ids)){
    temp <-mod.med.data[mod.med.data$id==sampled_ids[i],]
    # SAMPLE WITHIN EACH ID WITH REPLACEMENT
    if(case == "both"){
      print(case)
      set.seed(seed)
      temp <- temp[sample(nrow(temp), nrow(temp), replace = TRUE),]
    }
    #
    bootstrapped_df <-rbind(bootstrapped_df, temp)
  }

  # SmX * SyM --> indirect value when modval =0
  # SmX * SyM +SmX:W*SyM --> indirect when modval =1

  res1<-modmed.mlm(data = bootstrapped_df,
                   L2ID = "id",
                   X = X,
                   Y = Y,
                   M = M,
                   moderator = moderator,
                   mod.a = mod.a, mod.b = mod.b, mod.cprime = mod.cprime,
                   random.a = random.a, random.b = random.b, random.cprime = random.cprime)
  a1 = extract.modmed.mlm(res1,type = "a",modval1 = 1)
  a0 = extract.modmed.mlm(res1,type = "a",modval1 = 0)
  b= extract.modmed.mlm(res1,type = "b")
  indirect_1 = a1*b
  indirect_0 = a0*b
  indirect_diff = indirect_1 - indirect_0

  data.frame(indirect_0,indirect_1,indirect_diff)

}

custom_boostrap_parallel <-function(mod.med.data,bootstraps, X,Y,M, moderator,
                                    mod.a =TRUE, mod.b =FALSE, mod.cprime = TRUE,
                                    random.a = FALSE,random.b = FALSE,random.cprime = FALSE){

  finaldf <- foreach(j=1:bootstraps, .combine=rbind,
                     .export = c("parallel_bootstrap"), 
                     .packages = c("multilevelmediation")) %dopar% {
                       tempdf = parallel_bootstrap(seed = j,
                                                   mod.med.data,bootstraps, X,Y,M, moderator,
                                                   mod.a =TRUE, mod.b =FALSE, mod.cprime = TRUE,
                                                   random.a = FALSE,random.b = FALSE,random.cprime = FALSE,case = "L2") 

                       tempdf #Equivalent to finaldf = rbind(finaldf, tempdf)
                     }
}
SaKuhn commented 1 month ago

Hi,

I came across a similar issue asthe one described by voidyman - though with residual bootstrapping. The corresponding functions

_model_h1h2 <- boot.modmed.mlm.custom(data_tibble, ..., L2ID="unit", ...., boot.type="resid" extract.boot.modmed.mlm(modelh1h2, type = "a")

yield CI values which do not include the indicated estimate nor are anywhere close to the estimate. Also happening when trying to convert tibble to a data.frame and when playing around with other function parameters or extracting other path estimates.

falkcarl commented 1 month ago

Sorry I have not yet had time to look into this issue. But, I have 2 (edit: now 3) comments:

  1. I would not recommend using custom code unless you have also run simulations to ensure that it is working correctly.
  2. For all those who are reporting a problem, it is generally customary to provide a minimal working example (MWE). That is, a self-contained example (data included) that results in the problem. Small code snippets aren't enough.
  3. Code contributions/patches are also welcome. Fork the repo, sync your changes, and then do a pull request.
SaKuhn commented 1 month ago

Hello,

sorry for not having provided a MWE, I am rather new to using Github actively. Happy to provide some code and data - not simulated in the code since I could not properly simulate the hierarchical data structure - and the corresponding code yielding unexpected estimates given the 95%CI.

df_anon.csv

model_resid_50_jt <- boot.modmed.mlm.custom(df, nrep=50, L2ID="unit_id", X = "X_c", Y = "Y", M = "M_c",
                                           random.a = TRUE, random.b = TRUE, random.cprime = TRUE, 
                                           boot.type="resid", 
                                           parallel.type="parallel",ncores=8,seed=2299,
                                           control=list(opt="nlm")) # runs 192.24 seconds, with 50 reps just for testing

extract.boot.modmed.mlm(model_resid_50_jt, type = "a") 
extract.boot.modmed.mlm(model_resid_50_jt, type = "cprime")

I am wondering if some data propensities have to do with it, but since I am new to MLM mediation, this really is a guess and I would be glad for a hint why the 95%CI may be so off.

falkcarl commented 1 month ago

Of course, no problem. I think my comment applies to the tibble issue as well.

Thanks for posting one now - it looks like that's a bug that should now be fixed if you are able to use the latest version from GibHub. (for the record: 485dd9e )

SaKuhn commented 1 month ago

Perfect, it works! Thanks for solving this so quickly.