Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
60 stars 4 forks source link

Calculation of the number of parameters in continuous LCA #40

Closed justinsavoie closed 1 year ago

justinsavoie commented 1 year ago

Hello, and thanks for this great package!

In the first example of the first tutorial (LCA with Continuous Features) the number of estimated parameters is 28. Is there a reason for this? Normally for a diagonal GMM like this one it would be 12 (means) + 12 (variances) + 2 (number of weights - 1) = 26? That seems to be what is calculated in e.g. mclust or sklearn.mixture.GaussianMixture. It's also possible I misunderstand and there is no issue.

sachaMorin commented 1 year ago

You are correct. I can reproduce. Thanks for reporting

sachaMorin commented 1 year ago

I think I found the problem. For counting parameters, StepMix simply adds the number of parameters in the measurement and structural models to n_components - 1 in the main class (i.e., the class weights).

For Gaussian measurement models however, StepMix actually relies on the sklearn Gaussian mixture class, which already includes the class weights when calling n_parameters. Class weights are therefore included twice.

sachaMorin commented 1 year ago

This should be fixed with the changes in 1.2.3 and 1.2.4. Thanks again,