EducationalTestingService / factor_analyzer

A Python module to perform exploratory & confirmatory factor analyses.
GNU General Public License v2.0
234 stars 68 forks source link

Add CFI and RMSEA goodness-of-fit metrics #99

Open jimmybru opened 2 years ago

jimmybru commented 2 years ago

My (quite inexpert) understanding is that CFI and RMSEA are the goodness-of-fit measures par excellence when it comes to CFA. It seems that Chi-squared is also useful.

Would it be possible to add these metrics, like around here? https://github.com/EducationalTestingService/factor_analyzer/blob/main/factor_analyzer/confirmatory_factor_analyzer.py#L379

jbiggsets commented 2 years ago

@desilinguist, I'll have to double-check some of the formulas below, but I think RMSEA is pretty easy and do-able. (You can find a couple references here.)

Here are some of the steps, as far as I understand:

  1. Implement the Chi-squared test statistics. I believe this is just chi2 = self.n_obs * res.fun, where fun comes directly from the minimize results object (see here).

  2. Calculate the degrees of freedom. I believe this is (k (k - 1) / 2), or `dof = self.n_obs (self.n_obs - 1) / 2`, but we may also have to subtract out the number of model parameters at the end.

  3. Calculate the p-value. Once we have the test statistics and the degrees of freedom, this should be as simple as 1 - scipy.stats.chi2.cdf(chi2, dof).

  4. Calculate Root Mean Square Error of Approximation (RMSEA). There are a few different formulas I've seen, but I think they all reduce to rmsea = 0 if chi2 < dof else np.sqrt((chi2 / (dof * self.n_obs) - 1) / (self.n_obs - 1)) or something like that. I'll have to double check this , too. (We can look at psych to see how they implement.)

Getting the Comparative Fit Index (CFI) is a little more involved, since you have to calculate the Chi-squared test statistics for the baseline/null model, where all the variables are independent/uncorrelated. I'll read up on that a bit, but it does seem like a useful thing to add.

As an FYI, I believe that semopy has all of these implemented, but it's been a while since I've looked at that package. Not sure whether/how much we can borrow from that package. Looks like it's under the MIT license. (See here.)

desilinguist commented 2 years ago

This is amazingly helpful, @jbiggsets! I'll try and take a stab at RMSEA soon. It's funny you mention semopy since that's exactly what @jimmybru is using right now. I told him he should look into replacing that with factor analyzer but he needs these metrics to do that.

jimmybru commented 2 years ago

You guys are Beautiful!

jimmybru commented 2 years ago

Btw: this should probably be another issue, but just in case the answer is obvious, I’ll ask here: I don’t see a way in factor_analyer to specify covariances in the model like you can in semopy. Is that true? If so, is the default that all factors are orthogonal, or is everything allowed to covary with everything?