Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
56 stars 4 forks source link

Wrong number of classes with predict (2 instead of 3) #11

Closed giguerch closed 1 year ago

giguerch commented 1 year ago

from sklearn import datasets import pandas as pd from stepmix.stepmix import StepMix

iris = datasets.load_iris().data iris = pd.DataFrame(iris)

Now we use the stepmix package.

ir = iris.iloc[:,2:4] model = StepMix(n_components=3, n_steps = 1, measurement="gaussian_unit", random_state=1235) model.fit(ir) pr = model.predict(ir)

pr Out[30]: array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int64)

sachaMorin commented 1 year ago

That can happen with bad initializations or difficult data and the model collapses to two classes. You can see with model.predict_proba(ir) that the probability for the first class is just never the highest one.

I would suggest using all the Iris features. Or using measurement="gaussian_diag" to fit a variance parameter.