Specification of factor model with auto.var = TRUE, std.lv = TRUE with ordinal indicators

jpritikin commented 1 month ago

library(blavaan)  # 0.5-5

future::plan("multicore")

NoYesLevels <- c('No','Not sure','Yes')
EuphoricLevels <- c('Euphoric', 'Mildly euphoric', 'No change', 'Mild discomfort', 'Severe discomfort')
ImpactLevels <- c('Helped','Slightly helped','No impact','Slightly hindered','Hindered')
AbsItemNames <- c('boredom','frustration','anxiety', 'abort','bodyAcute','bodyAfter','sleep')

ImportStudyItems <- function(df) {
  if (any(is.na(match(AbsItemNames, colnames(df))))) stop("Items are mising")

  df[['boredom']] <- ordered(df[['boredom']], levels=NoYesLevels)
  df[['frustration']] <- ordered(df[['frustration']], levels=NoYesLevels)
  df[['anxiety']] <- ordered(df[['anxiety']], levels=NoYesLevels)
  df[['abort']] <- ordered(df[['abort']], levels=NoYesLevels)
  df[['bodyAcute']] <- ordered(df[['bodyAcute']], levels=EuphoricLevels)
  df[['bodyAfter']] <- ordered(df[['bodyAfter']], levels=EuphoricLevels)
  df[['sleep']] <- ordered(df[['sleep']], levels=ImpactLevels)
  df
}

psiData <- ImportStudyItems(read.csv("psi.csv"))
pmData <- ImportStudyItems(read.csv("pm.csv"))
combinedData <- rbind(cbind(psiData, group="psi"),
                      cbind(pmData, group="pm"))
combinedData$group <- factor(combinedData$group)

spec <- '
phenom =~ boredom + frustration + anxiety + abort + bodyAcute + bodyAfter + sleep

boredom ~ 1
frustration ~ 1
anxiety ~ 1
abort ~ 1
bodyAcute ~ 1
bodyAfter ~ 1
sleep ~ 1

boredom | -.43*t1 + .43*t2
frustration | -.43*t1 + .43*t2
anxiety | -.43*t1 + .43*t2
abort | -.43*t1 + .43*t2
bodyAcute | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
bodyAfter | -.84*t1 + -.25*t2 + .25*t3 + .84*t4 
sleep | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
'
fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "means", "residuals", "regressions"),
               group.partial = c("phenom~1"),
               auto.var = TRUE, std.lv = TRUE)
summary(fit)

There are three things that I don't understand:

I have auto.var=TRUE but I don't see any estimated parameters related to the observed indicator variances. It seems like there has got to be some estimated observed variances otherwise I don't understand how the latent variance can be fixed to 1.
I have group.partial = c("phenom~1") but the phenom mean is fixed to zero in both groups. Weirdly, the variance for phenom is free in one group. Isn't ~1 the notation for the intercept, not the variance? How do I free it for one group and keep fixed in other group?
I have std.lv = TRUE and all of my indicators are unbounded. Don't I need to constrain at least one of them to be positive or negative? Or do you take care of this automagically?

These simulated data have at least one observation in every ordinal outcome category, pm.csv psi.csv

ecmerkle commented 1 month ago

Thanks for the example, and see below. In general, I think the model syntax in spec will provide more flexibility than the arguments in blavaan(). But I wonder whether the following gets closer to what you are looking for:

fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "regressions"),
               auto.var = TRUE, std.lv = TRUE)

And in response to your bullets:

In the summary() there is a "variance" section and a "scales" section for residual variance of each variable and for full latent variance (which includes variance due to factors), respectively. In your original model, I think that group.equal = "residuals" was fixing all the variances to 1 for both groups. Removing that, you should now see estimated variances for group 2 but not group 1. I am not sure that it is possible to stop the group 1 variances from equaling 1 via a blavaan() argument, it might be necessary to add lines like below to the model syntax:

boredom ~~ c("v1", "v2") * boredom

Yes, ~1 is for the mean. I am not quite sure why the group.partial thing is not working (maybe a bug), and I think the variance is being estimated independently of this statement (due to the multiple groups). For now, I think you need to get rid of group.equal = "means" and then add something like this to the model syntax:

phenom ~ c(NA, 0)*1

The first loading is automagically constrained to be positive. Some more detail is here.

jpritikin commented 1 month ago

Okay, I changed it to

spec <- '
phenom =~ boredom + frustration + anxiety + abort + bodyAcute + bodyAfter + sleep

boredom ~ 1
frustration ~ 1
anxiety ~ 1
abort ~ 1
bodyAcute ~ 1
bodyAfter ~ 1
sleep ~ 1
phenom ~ c(0, NA)*1

phenom ~~ 1*phenom
boredom ~~ v1*boredom
frustration ~~ v2*frustration
anxiety ~~ v3*anxiety
abort ~~ v4*abort
bodyAcute ~~ v5*bodyAcute
bodyAfter ~~ v6*bodyAfter
sleep ~~ v7*sleep

boredom | -.43*t1 + .43*t2
frustration | -.43*t1 + .43*t2
anxiety | -.43*t1 + .43*t2
abort | -.43*t1 + .43*t2
bodyAcute | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
bodyAfter | -.84*t1 + -.25*t2 + .25*t3 + .84*t4 
sleep | -.84*t1 + -.25*t2 + .25*t3 + .84*t4
'
fit <- blavaan(spec, data=combinedData,
               n.chains = 2, sample=500,
               ordered = AbsItemNames,
               group = "group",
               group.equal = c("loadings", "intercepts", "regressions"),
               std.lv=TRUE)

This is looking really good. The only weird thing that I notice is that the variance of the ordinal item abort got larger than 1.0. Shouldn't the variance be bounded above at 1.0? Can I put an upper bound on the variances?

   .abort     (v4)    0.643    0.249    0.304    1.212    0.998 gamma(1,.5)[sd]

ecmerkle commented 1 month ago

Good to hear things are working. I don't think the variance is bounded at 1 because this is using the theta parameterization, i.e., there is no restriction that the total variance is 1 (and the "scales" part of the summary output shows that the total variances are above 1). There is no way to add an upper bound, but you can change the prior so there is little density above 1, using an argument like

dp = dpriors(theta = "gamma(5,7)[sd]")

where you could also use [var] or [prec] to put a prior on the variance or precision, instead of the SD.

jpritikin commented 1 month ago

You seem to imply that it's fine, in theory, to have an ordinal indicator with a total variance greater than 1. I'm not sure where I got the idea that ordinal indicators should max out at a variance of 1. Maybe I'm misremembering something. I guess I'll leave the model as is. I doubt it would make much practical difference anyway.

ecmerkle commented 1 month ago

I think that a common identification constraint for ordinal CFA involves fixing the total variance to 1, i.e.,

$$ diag(\Lambda \Psi \Lambda^\prime + \Theta) = 1 $$

Then you are right that the variances (theta estimates) should not go above 1. The default for blavaan (the "theta" parameterization) is instead to fix $diag(\Theta) = 1$. I am pretty sure that you are identifying your model from the across-group constraints, which would be a third option.

ecmerkle / blavaan

Specification of factor model with auto.var = TRUE, std.lv = TRUE with ordinal indicators #83