Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
56 stars 4 forks source link

Number of parameters in categorical model #8

Closed sachaMorin closed 1 year ago

sachaMorin commented 1 year ago

Current categorical model can handle multiple categorical features. However, the current code assumes all features share the same number of outcomes. This is not really an issue for estimation, since the features with fewer outcomes simply have unused "padded" 0 parameters.

stepmix.emission.categorical.Multinoulli.n_parameters() still assumes those unused parameters are free parameters. This will obviously affect the number of reported free parameters as well as the BIC and AIC scores. We need to either enforce that all categoricals share the same number of outcomes or allow more flexibility in the number of outcomes.

sachaMorin commented 1 year ago

We should also consider that not all parameters in the matrix are free. There are actually n_outcomes - 1 free parameters per feature.