TDJorgensen / lavaan.mi

Use R package lavaan to fit SEM to multiple imputations of missing data
3 stars 1 forks source link

Invert NACOV first or last for WLS.V? #1

Closed Suppanut closed 8 months ago

Suppanut commented 8 months ago

Shouldn't we invert NACOV last to avoid the inversion of a full matrix in DWLS? solve(diag(diag(NACOV[[g]])))

https://github.com/TDJorgensen/lavaan.mi/blob/b4a106823a83d9f6fd635319073a77ecd5d18755/R/poolSatMod.R#L366

TDJorgensen commented 8 months ago

You might be confusing two different steps:

  1. pooling saturated model results
  2. fitting a restricted SEM to pooled results

Shouldn't we invert NACOV last

We invert first because the inverse of a diagonal matrix is just taking the reciprocal of diagonal elements. We need to invert the full matrix, then save those diagonal values. This only happens once (per "block": group and/or level) when pooling within- and between-imputation components of (N)ACOV, which is then passed to the lavaan(..., WLS.V=) argument.

to avoid the inversion of a full matrix in DWLS?

That is happening during estimation, which is after the pooling of Sigma and its NACOV. There is no no avoiding the need to evaluate Equation 9 (Wirth & Edwards, 2009) every time the point estimates are updated, but at that point, the diagonal matrix W is easy to invert because it is just the reciprocals of the diagonal.

Suppanut commented 8 months ago

I posed the question because I expected that analyzing copies of complete data (which initially have no missing values) using lavaan.mi should yield results identical to those obtained from analyzing a single complete dataset with lavaan. However, the results differ.

Please check this example

library(lavaan.mi)

HS <- HolzingerSwineford1939[ , paste0("x", 1:9)]
imps <- list(HS,HS,HS,HS,HS) # 5 copies of the original data
HS3cat <- lapply(imps, function(x) {
  as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) )
})

HS.model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

## pool polychoric correlations and thresholds
prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9))
fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE)
lavTest(fitc, test = "browne.residual.adf", output = "text")
# Model Test User Model:
#   
#   Browne's residual-based (ADF) test                  
#   Test statistic                                47.835
#   Degrees of freedom                                24
#   P-value (Chi-square)                           0.003

fit <- sem(HS.model, HS3cat[[1]], 
           ordered = TRUE, 
           std.lv = TRUE,  
           sample.cov.rescale = FALSE,
           fixed.x = FALSE,
           conditional.x = FALSE,
           estimator = "DWLS",
           se = "robust.sem",
           test = "Browne.residual.adf")
lavTest(fit, test = "browne.residual.adf", output = "text")
# Model Test User Model:
#   
#   Browne's residual-based (ADF) test                  
#   Test statistic                                48.097
#   Degrees of freedom                                24
#   P-value (Chi-square)                           0.002

all.equal(fitMeasures(fitc), fitMeasures(fit)) # Mean relative difference: 0.4973422
all.equal(inspect(fitc, "WLS.V"), inspect(fit, "WLS.V")) # "Mean relative difference: 0.534187"

# Modify WLS.V
prePooledData3$WLS.V <- solve(diag(diag(prePooledData3$NACOV)))
fitc2 <- cfa(HS.model, data = prePooledData3, std.lv = TRUE)
lavTest(fitc2, test = "browne.residual.adf", output = "text")
# Model Test User Model:
#   
#   Browne's residual-based (ADF) test                  
#   Test statistic                                48.097
#   Degrees of freedom                                24
#   P-value (Chi-square)                           0.002

all.equal(fitMeasures(fit), fitMeasures(fitc2)) # TRUE
all.equal(inspect(fit, "WLS.V"), inspect(fitc2, "WLS.V")) # TRUE

Do you have any thoughts on this?

TDJorgensen commented 8 months ago

Ah, this is a nice empirical demonstration without between-imputation variance. I should have thought of that. Thank you!

Looking back at the source material, Muthén (1997, p. 16) does state that W should be the diagonal of Gamma (NACOV), which is then inverted. So yes, you were right, and that explains why your suggestion works. I will update that to close this issue.