Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
54 stars 4 forks source link

plot the confidence intervals with mixed descriptors #25

Closed FelixLaliberte closed 1 year ago

FelixLaliberte commented 1 year ago

When using the bootstrap() function on a model with mixed descriptors, how can we plot the confidence intervals? Also, what is the equivalence on 'pis' for continuous variables?

Here is an example with the Iris dataset:

import pandas as pd import numpy as np from stepmix.stepmix import StepMix from stepmix.utils import get_mixed_descriptor from stepmix.bootstrap import bootstrap from sklearn.datasets import load_iris from sklearn.metrics import rand_score import matplotlib.pyplot as plt

data, target = load_iris(return_X_y=True, as_frame=True)

for c in data: c_categorical = c.replace("cm", "cat") data[c_categorical] = pd.qcut(data[c], q=3).cat.codes c_binary = c.replace("cm", "binary") data[c_binary] = pd.qcut(data[c], q=2).cat.codes

for i, c in enumerate(data.columns): data[c] = data[c].sample(frac=.5, random_state=42*i)

mm_data, mm_descriptor = get_mixed_descriptor( dataframe=data, continuous_nan=['sepal length (cm)', 'sepal width (cm)'], binary_nan=['sepal length (binary)', 'sepal width (binary)'], categorical_nan=['sepal length (cat)', 'sepal width (cat)'], )

sm_data, sm_descriptor = get_mixed_descriptor( dataframe=data, categorical_nan=['petal length (cat)', 'petal width (cat)'], )

model4 = StepMix(n_components=3, measurement=mm_descriptor, structural=sm_descriptor, n_init=1, random_state=123) model4.fit(mm_data, sm_data)

model4, bootstrapped_params4 = bootstrap(model4, mm_data, sm_data, n_repetitions=1000)

params4 = model4.get_parameters() params4['weights']

from stepmix.bootstrap import plot_all_parameters_CI figures = plot_all_parameters_CI(model4.get_parameters(), bootstrapped_params4, alpha=5) # error

params4['measurement'] params4['measurement'].keys() params4['measurement']['binary_nan']['pis'] params4['measurement']['categorical_nan']['pis'] params4['measurement']['continuous_nan']['pis'] params4['measurement']['covariances']['pis']

sachaMorin commented 1 year ago

Regarding your second question. You can use keys() on any Python dictionary. In your case, you can try params4['measurement']['continuous_nan'].keys() to know what are the keys you can access in that object.

sachaMorin commented 1 year ago

I can reproduce. Error comes from a mishandling of grid plots when there are exactly 2 or 3 parameters,