ecmerkle / blavaan

An R package for Bayesian structural equation modeling
https://ecmerkle.github.io/blavaan
86 stars 23 forks source link

Specification of factor model with auto.var = TRUE, std.lv = TRUE with ordinal indicators #83

Closed jpritikin closed 1 month ago

jpritikin commented 1 month ago
library(blavaan)  # 0.5-5

future::plan("multicore")

NoYesLevels <- c('No','Not sure','Yes')
EuphoricLevels <- c('Euphoric', 'Mildly euphoric', 'No change', 'Mild discomfort', 'Severe discomfort')
ImpactLevels <- c('Helped','Slightly helped','No impact','Slightly hindered','Hindered')
AbsItemNames <- c('boredom','frustration','anxiety', 'abort','bodyAcute','bodyAfter','sleep')

ImportStudyItems <- function(df) {
  if (any(is.na(match(AbsItemNames, colnames(df))))) stop("Items are mising")

  df[['boredom']] <- ordered(df[['boredom']], levels=NoYesLevels)
  df[['frustration']] <- ordered(df[['frustration']], levels=NoYesLevels)
  df[['anxiety']] <- ordered(df[['anxiety']], levels=NoYesLevels)
  df[['abort']] <- ordered(df[['abort']], levels=NoYesLevels)
  df[['bodyAcute']] <- ordered(df[['bodyAcute']], levels=EuphoricLevels)
  df[['bodyAfter']] <- ordered(df[['bodyAfter']], levels=EuphoricLevels)
  df[['sleep']] <- ordered(df[['sleep']], levels=ImpactLevels)
  df
}

psiData <- ImportStudyItems(read.csv("psi.csv"))
pmData <- ImportStudyItems(read.csv("pm.csv"))
combinedData <- rbind(cbind(psiData, group="psi"),
                      cbind(pmData, group="pm"))
combinedData$group <- factor(combinedData$group)

spec <- '
phenom =~ boredom + frustration + anxiety + abort + bodyAcute + bodyAfter + sleep

boredom ~ 1
frustration ~ 1
anxiety ~ 1
abort ~ 1
bodyAcute ~ 1
bodyAfter ~ 1
sleep ~ 1

boredom | -.43*t1 + .43*t2
frustration | -.43*t1 + .43*t2
anxiety | -.43*t1 + .43*t2
abort | -.43*t1 + .43*t2
bodyAcute | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
bodyAfter | -.84*t1 + -.25*t2 + .25*t3 + .84*t4 
sleep | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
'
fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "means", "residuals", "regressions"),
               group.partial = c("phenom~1"),
               auto.var = TRUE, std.lv = TRUE)
summary(fit)

There are three things that I don't understand:

These simulated data have at least one observation in every ordinal outcome category, pm.csv psi.csv

ecmerkle commented 1 month ago

Thanks for the example, and see below. In general, I think the model syntax in spec will provide more flexibility than the arguments in blavaan(). But I wonder whether the following gets closer to what you are looking for:

fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "regressions"),
               auto.var = TRUE, std.lv = TRUE)

And in response to your bullets:

  1. In the summary() there is a "variance" section and a "scales" section for residual variance of each variable and for full latent variance (which includes variance due to factors), respectively. In your original model, I think that group.equal = "residuals" was fixing all the variances to 1 for both groups. Removing that, you should now see estimated variances for group 2 but not group 1. I am not sure that it is possible to stop the group 1 variances from equaling 1 via a blavaan() argument, it might be necessary to add lines like below to the model syntax:
boredom ~~ c("v1", "v2") * boredom
  1. Yes, ~1 is for the mean. I am not quite sure why the group.partial thing is not working (maybe a bug), and I think the variance is being estimated independently of this statement (due to the multiple groups). For now, I think you need to get rid of group.equal = "means" and then add something like this to the model syntax:
phenom ~ c(NA, 0)*1
  1. The first loading is automagically constrained to be positive. Some more detail is here.
jpritikin commented 1 month ago

Okay, I changed it to

spec <- '
phenom =~ boredom + frustration + anxiety + abort + bodyAcute + bodyAfter + sleep

boredom ~ 1
frustration ~ 1
anxiety ~ 1
abort ~ 1
bodyAcute ~ 1
bodyAfter ~ 1
sleep ~ 1
phenom ~ c(0, NA)*1

phenom ~~ 1*phenom
boredom ~~ v1*boredom
frustration ~~ v2*frustration
anxiety ~~ v3*anxiety
abort ~~ v4*abort
bodyAcute ~~ v5*bodyAcute
bodyAfter ~~ v6*bodyAfter
sleep ~~ v7*sleep

boredom | -.43*t1 + .43*t2
frustration | -.43*t1 + .43*t2
anxiety | -.43*t1 + .43*t2
abort | -.43*t1 + .43*t2
bodyAcute | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
bodyAfter | -.84*t1 + -.25*t2 + .25*t3 + .84*t4 
sleep | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
'
fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "regressions"),
               std.lv=TRUE)

This is looking really good. The only weird thing that I notice is that the variance of the ordinal item abort got larger than 1.0. Shouldn't the variance be bounded above at 1.0? Can I put an upper bound on the variances?

   .abort     (v4)    0.643    0.249    0.304    1.212    0.998 gamma(1,.5)[sd]
ecmerkle commented 1 month ago

Good to hear things are working. I don't think the variance is bounded at 1 because this is using the theta parameterization, i.e., there is no restriction that the total variance is 1 (and the "scales" part of the summary output shows that the total variances are above 1). There is no way to add an upper bound, but you can change the prior so there is little density above 1, using an argument like

dp = dpriors(theta = "gamma(5,7)[sd]")

where you could also use [var] or [prec] to put a prior on the variance or precision, instead of the SD.

jpritikin commented 1 month ago

You seem to imply that it's fine, in theory, to have an ordinal indicator with a total variance greater than 1. I'm not sure where I got the idea that ordinal indicators should max out at a variance of 1. Maybe I'm misremembering something. I guess I'll leave the model as is. I doubt it would make much practical difference anyway.

ecmerkle commented 1 month ago

I think that a common identification constraint for ordinal CFA involves fixing the total variance to 1, i.e.,

$$ diag(\Lambda \Psi \Lambda^\prime + \Theta) = 1 $$

Then you are right that the variances (theta estimates) should not go above 1. The default for blavaan (the "theta" parameterization) is instead to fix $diag(\Theta) = 1$. I am pretty sure that you are identifying your model from the across-group constraints, which would be a third option.