ecmerkle / blavaan

An R package for Bayesian structural equation modeling
https://ecmerkle.github.io/blavaan
85 stars 23 forks source link

ordered categorical variable with missing data #86

Closed jpritikin closed 1 week ago

jpritikin commented 1 month ago

With missing data, I'm getting:

Error in if (nth < 1L) next : missing value where TRUE/FALSE needed
Calls: blavaan ... lav_lavaan_step04_partable -> lavParTable -> lav_partable_flat
In addition: Warning message:
lavaan->lav_data_full():  
   some ordered categorical variable(s) have more than 12 levels: NA, NA, NA, 
   NA, NA 
Execution halted

Here is my script again:

library(blavaan)

NoYesLevels <- c('No','Not sure','Yes')
EuphoricLevels <- c('Euphoric', 'Mildly euphoric', 'No change', 'Mild discomfort', 'Severe discomfort')
ImpactLevels <- c('Helped','Slightly helped','No impact','Slightly hindered','Hindered')
AbsItemNames <- c('boredom','frustration','anxiety', 'abort','bodyAcute','bodyAfter','sleep')

ImportStudyItems <- function(df) {
  if (any(is.na(match(AbsItemNames, colnames(df))))) stop("Items are mising")

  df[['boredom']] <- ordered(df[['boredom']], levels=NoYesLevels)
  df[['frustration']] <- ordered(df[['frustration']], levels=NoYesLevels)
  df[['anxiety']] <- ordered(df[['anxiety']], levels=NoYesLevels)
  df[['abort']] <- ordered(df[['abort']], levels=NoYesLevels)
  df[['bodyAcute']] <- ordered(df[['bodyAcute']], levels=EuphoricLevels)
  df[['bodyAfter']] <- ordered(df[['bodyAfter']], levels=EuphoricLevels)
  df[['sleep']] <- ordered(df[['sleep']], levels=ImpactLevels)
  df
}

psiData <- ImportStudyItems(read.csv("psi.csv"))
pmData <- ImportStudyItems(read.csv("pm.csv"))
combinedData <- rbind(cbind(psiData, group="psi"),
                      cbind(pmData, group="pm"))
combinedData$group <- factor(combinedData$group)

spec <- '
phenom =~ boredom + frustration + anxiety + abort + bodyAcute + bodyAfter + sleep

boredom ~ 1
frustration ~ 1
anxiety ~ 1
abort ~ 1
bodyAcute ~ 1
bodyAfter ~ 1
sleep ~ 1
phenom ~ c(0, NA)*1

phenom ~~ 1*phenom
boredom ~~ v1*boredom
frustration ~~ v2*frustration
anxiety ~~ v3*anxiety
abort ~~ v4*abort
bodyAcute ~~ v5*bodyAcute
bodyAfter ~~ v6*bodyAfter
sleep ~~ v7*sleep

boredom | -.43*t1 + .43*t2
frustration | -.43*t1 + .43*t2
anxiety | -.43*t1 + .43*t2
abort | -.43*t1 + .43*t2
bodyAcute | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
bodyAfter | -.84*t1 + -.25*t2 + .25*t3 + .84*t4 
sleep | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
'
fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "regressions"),
               std.lv=TRUE, allow.empty.cell = TRUE, cat.wls.w = FALSE)
summary(fit)

New simulated data with some missing entries are here: pm.csv psi.csv

ecmerkle commented 1 month ago

Thanks for the reports. This required some further modifications to lavaan, but I believe all your examples should now be working. Please try it out using my fork of lavaan, https://github.com/ecmerkle/lavaan, along with updated blavaan. Then I will send the lavaan PR once I test it out a bit more.

jpritikin commented 1 month ago

Okay! I'll try it. BTW, where can I learn more about ppp? Is that what you are estimating in the post-sampling phase? I suspect it is a posterior predictive p-value, but I couldn't find it documented in the blavaan man pages. Did I miss something?

ecmerkle commented 1 month ago

Yes, there is a posterior predictive p-value that is computed, and not much documentation about it. Some general detail about ppp appears in the appendix of the JSS paper. For models with ordinal variables, things get more complicated because there are multiple likelihoods that could contribute to a ppp. Some details about the likelihoods are here, and a student and I are assessing the best way to get a ppp out of all these.

jpritikin commented 1 month ago

Ah, okay. Good it know that ppp is not ready yet. I guess the only criteria that I need to report is zero divergent transitions, R-hat < 1.015, and tail/bulk ess > 100*numChains. I think my data is too small to expect much from the loo package. Any other model fitness measures I should be aware of?

jpritikin commented 1 month ago

After succeeding on about 10 simulated datasets, I got:

Computing post-estimation metrics (including lvs if requested)...
Error: lavaan->lav_data_full():  
   some variables have no variance in group 2 : frustration

Data here: pm.csv psi.csv

ecmerkle commented 1 month ago

Slowly working on this one. It is easy enough to turn off the check for this in lavaan, but then I encounter other errors elsewhere...

jpritikin commented 1 month ago

Thanks for letting me know that you're working on it. I can imagine that there is quite a lot of code to untangle. I really appreciate your efforts. I really need this model to work. I don't know how to analyze the anticipated data without this model.

ecmerkle commented 4 weeks ago

Ok, try again with my fork of lavaan and github version of blavaan. I needed to turn off the variance check and then also turn off some starting value computations in lavaan. It is quite possible that I missed something else, especially for models beyond cfa.

jpritikin commented 4 weeks ago

I've run about 100 datasets. So far so good. Is there a way to turn off Computing post-estimation metrics (including lvs if requested? This step seems to take a long time.

ecmerkle commented 3 weeks ago

If you add test = "none", it will avoid some of the computations but also not provide WAIC/LOOIC. I believe the most time-consuming part is computing casewise marginal likelihoods for each posterior sample, as described here.