Closed littlehifive closed 2 years ago
Thanks for the report, I will take a look soon. I am also looping in the two people who know this part of the code best:
@maugavilla @TDJorgensen : in recent months, Zezhen has helpfully found a variety of bugs and unclear issues in blavaan. I am wondering whether you have seen the issues that he describes here. The NAs could possibly be related to recent changes that I made to logl computations, but I am not certain about that.
@littlehifive could you share more information so we can reporduce the error? Something like sample size, variable means, covariance matrix, and full code? With this we could simulate similar data and reproduce it.
I think I know what might be but would like to see more ifnormation before
@maugavilla
I created a simulated dataset based on my data, and I was able to reproduce the problems with BRMSEA and adjusted GammaHat using the following model. Let me know if you need anything else from me, thanks!
library(blavaan)
# load test_dat.csv here
# lavaan
model <- '
# measurement model
EGRA_2m1 =~ VOC_2m1 + LTRC_2m1 + GRPH_2m1 + INV_2m1 + OPR_2m1 + RDCP_2m1
GRPH_2m1 ~~ LTRC_2m1
RDCP_2m1 ~~ VOC_2m1
RDCP_2m1 ~~ OPR_2m1
# measurement model
EGRA_3m1 =~ VOC_3m1 + LTRC_3m1 + GRPH_3m1 + INV_3m1 + OPR_3m1 + RDCP_3m1
GRPH_3m1 ~~ LTRC_3m1
RDCP_3m1 ~~ VOC_3m1
RDCP_3m1 ~~ OPR_3m1
# correlation
EGRA_3m1 ~~ EGRA_2m1
'
# priors
mydp <- dpriors(tau = "normal(0, 0.5)",
lambda = "normal(0, 2)",
beta = "normal(0.5,0.2)",
rho = "beta(1,1)",
nu = "normal(0, 5)")
# fit model
fit <- bcfa(model, data = test_dat,
dp = mydp,
std.lv=TRUE,
seed = 1234)
# fit indices
blavFitIndices(fit)
# Posterior mean (EAP) of devm-based fit indices:
#
# BRMSEA BGammaHat adjBGammaHat BMc
# NaN 0.801 1.680 0.475
# Warning messages:
# 1:
# 41 (20.5%) p_waic estimates greater than 0.4. We recommend trying loo instead.
# 2: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#
# 3: In sqrt(nonc/(dif.ppD * N)) : NaNs produced
@littlehifive
Thnsk, I see what the problem is. I think this change is related to the logl change that @ecmerkle mentioned earlier. I think that change affected the pD calculation for loo and waic. As you can see in the fitMeasures() function, the p_dic, p_waic, and p_loo are different calculation, and vary, but are usually close to each other, but now p_dic=36, and p_waic=126, and p_loo=121 So, these differences should not be this large
`
fitMeasures(fit) npar logl ppp bic dic p_dic waic p_waic se_waic 43.000 -3081.635 0.000 6390.883 6235.755 36.242 6343.645 126.500 271.855 looic p_loo se_loo margloglik 6333.824 121.589 264.850 -3217.457 `
The think is that pD is used in the fit indices calculations, for now the quick fix is to change function to use DIC pD like this
bfits <- blavFitIndices(fit, pD="dic") summary(bfits)
With this change works fine
@ecmerkle could you check why the p_waic and p_loo might be so much larger now? Curiosly, this doesnt happens with the example model, so my guess would be that is related to the residual correlations?
`HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 '
fit1 <- bcfa(HS.model, data = HolzingerSwineford1939, n.chains = 2, burnin = 1000, sample = 1000) fitMeasures(fit1)`
> fitMeasures(fit1) npar logl ppp bic dic p_dic waic p_waic se_waic 30.000 -3738.344 0.000 7647.802 7535.950 29.631 7539.915 33.019 86.116 looic p_loo se_loo margloglik 7540.114 33.119 86.137 -3862.744
Thanks for looking at it. This is an interesting issue that I haven't yet figured out. I have drastically simplified the model, and I still see inflated p_waic
and p_loo
:
model <- ' EGRA_2m1 =~ VOC_2m1 + LTRC_2m1 + GRPH_2m1 '
fit <- bcfa(model, data = test_dat, dp = mydp, std.lv = TRUE)
I have also tried it with older versions of blavaan (0.4-1 and 0.3-17) and with an older version of loo (2.4-1). For all versions, I see similar behavior. And, like Mauricio mentions, the usual test models always seem to work as expected.
One thing I noticed is that, for the simplified model above, the estimated factor variance can be close to 0 if you use std.lv = FALSE
. I will look at it some more tomorrow.
For BTLI, I understand it can be over 1 from the formula
In frequentist SEM, adjusted GFI (and its small-sample improvement: adjusted-GammaHat) can be negative in very poorly fitting models. But values > 1 only happen in this case of trying to derive a Bayesian analogous (not equivalent) quantity.
adjBGammaHat
can exceed 1 whenever the analogous quantity to "degrees of freedom" (DF
below) is negative:
adjBGammaHat <- 1 - (pStar / DF) * (1 - BGammaHat)
In our paper, we use the canonical number of sufficient statistics minus the (estimated) number of estimated parameters (pD)
# number of sufficient stats
pStar <- ((nvar * (nvar + 3))/2) # multiplied by number of groups when relevant
# estimated number of estimated parameters
pD # borrowed from DIC, WAIC, or LOO-IC
# analogous to degrees of freedom
DF <- pStar - pD
The negative DF
is occurring with your data, and it is difficult to understand why.
blavaan
's JSS paper, but I think that is no longer relevant with the faster Stan model that marginalizes out the latent variables by using summary stats. Also, as @ecmerkle found with a smaller model without residual correlations, the issue persists.NA
, but there is still a larger pD from WAIC and LOO than pStar (=9). Looking at their histograms, there is heavy inflation among the last 3 indicators(> 50 on OPR_2m1
, > 70% on INV_2m1
and RDCP_2m1
). The first 3 indicators look more normal, but the issue persists in all these cases.mydp
priors, and the issue persisted.Some syntax I wrote to try the above:
## try getting rid of extreme outliers
dat <- test_dat
vv <- c("VOC_2m1","LTRC_2m1","GRPH_2m1","INV_2m1","OPR_2m1","RDCP_2m1")
for (i in vv) {
sc <- abs(scale(dat[,i]))
dat[,i] <- ifelse(sc > 3, NA, dat[,i])
}
## model with first 3 indicators (more normally distributed, but low correlations)
model <- ' EGRA_2m1 =~ VOC_2m1 + LTRC_2m1 + GRPH_2m1 '
## or try fitting model to most highly correlated indicators (but highly inflated)
model <- ' EGRA_2m1 =~ INV_2m1 + OPR_2m1 + RDCP_2m1 '
#fit <- cfa(model, data = test_dat, std.lv = TRUE) # quick check with ML
bfit <- bcfa(model, data = dat, dp = mydp, std.lv = TRUE)
blavFitIndices(bfit)
So I can't rule out that there is something pathological in these simulated data, but I can't be sure that is what is causing the pD estimates to exceed pStar in all these cases. I hope some of these explorations might help @ecmerkle or @maugavilla by triggering ideas about where to look, but I am not convinced it is caused by the newer logl
calculations. If I interpreted Ed's last message correctly, the issue existed before the logl
change.
Instead of a problem with blavaan
(or loo
) source code, I expect it is an issue that can arise with particular data characteristics (or their interaction with model characteristics) that we haven't discovered yet. It is possible that the normal likelihood function is inappropriate for your data, which causes more variation across the parameter space, leading to greater pD estimates (especially in the WAIC and LOO cases, which more completely utilize the available posterior information).
Thank you all for your comments! I am looping in Ben Goodrich (@bgoodri) here too.
I am not sure if some background information about my data would be helpful to solving this issue:
Terrence's interpretation is right: because this also happens in older versions of blavaan, I don't think that recent changes broke something. I will double-check the log-likelihood computations to make sure, but I am leaning towards "particular data characteristics" as the explanation here.
Instead of defining outliers as Terrence did, I used the Pareto k values coming out of loo (removing cases where k is > .5). This leads to about 5 cases being removed (sorry I didn't set the seed). But after removing them, the DIC p_d drops to around -9. The old winbugs user manual says that negative p_d can happen with prior-data conflict, or when posterior means fall in an area of low density (say, when there is a bimodal posterior). I think this is some evidence for the "data characteristics" explanation.
If you compute DIC in the JAGS way (using jags.ic = TRUE), the p_d for DIC seems more reasonable regardless of whether the cases are included.
Code is below:
model_ll <- loo::extract_log_lik(fit@external$mcmcout)
loo_out <- loo(model_ll)
outl <- which(loo_out$diagnostics$pareto_k > .5)
reduced_dat <- test_dat[-outl,]
fitr1 <- bcfa(model, data = reduced_dat, jags.ic = TRUE)
fitMeasures(fitr1)
# npar logl ppp bic dic p_dic dic_jags
# 9.000 -747.657 0.482 1542.678 1477.012 -9.151 1511.448
#p_dic_jags waic p_waic se_waic looic p_loo se_loo
# 8.067 1498.936 12.245 55.656 1499.041 12.298 55.685
#margloglik
# -781.064
Giving the exaplanation about the original data, I find it interesting that this seems to be a good diagnostic of the data mismatching the models likelihood, and that loo and waic are sensnitive to this but not dic
When I thry ecluding outliers with the larger model, it needs several rounds of excluding them based on the pareto_k, which doesnt seem like a sustainable solution to me
If the data is actually zero-inflated, I think you should try to run the indicators as categorical, with one category for 0, and then ordered the rest of the answers. This would make it similar to a Graded response Model, which is better for this cases than treating as continuous.
In another project we modeled zero-inflation as a mixture IRT model, modeling the zero-inflation as a correlated mixture of the GRM model, and can find the Stan code here https://osf.io/hvkfn/ Magnus, B. E., & Garnier-Villarreal, M. (2022). A multidimensional zero-inflated graded response model for ordinal symptom data. Psychological Methods, 27(2), 261–279. https://doi.org/10.1037/met0000395
I agree that excluding people on the basis of pareto_k is not a solution... I was just looking at how sensitive the information criteria were to those observations.
It seems like we are agreeing that there is not a bug here, but that the behavior of these fit metrics can be suboptimal for models that do not fit so well.
So it seems a solution could be a warning from fitMeasures
if the effective number of parameters are negative, or if effective number of parameters have large variability across metrics. Also, related to the above post from @TDJorgensen , does it make sense to restrict DF to never go below 0 there?
does it make sense to restrict DF to never go below 0
I would not advise that. It could end up hiding problems that should be brought to attention.
a warning from
fitMeasures
if the effective number of parameters are negative, or if effective number of parameters have large variability across metrics
That sounds like a good solution. By "different metrics" do you mean DIC vs. WAIC vs. LOO_IC? We don't have a lot to go on here, so I don't think the warning can be very informative about what to do (maybe there is nothing to do) or what to investigate (there might be other issues). Perhaps something like:
warning("The estimated number of model parameters exceeds the number of sample statistics (covariances, etc.), so fit-index calculations may lead to uninterpretable results.")
Yes, I mean that we look at effective number of parameters under DIC, WAIC, LOO. If DIC is negative, or if they differ by "alot" (maybe some percentage away from a simple parameter count?), then a warning appears.
I am not sure this is exactly the same as the negative DF issue (definitely related, maybe not exactly the same). Maybe we warn about negative DF (in blavFitIndices
) separately from the effective number of parameter warning (from fitMeasures
).
Hi Ed,
For the following model, I got NAs for BRMSEA and a value over 1 for adjBGammaHat (which should be bounded between 0 and 1 right?) Do you know why that could be the case?
For BTLI, I understand it can be over 1 from the formula, but is it the larger the better or the closer to 1 the better?
Thanks!