Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
54 stars 4 forks source link

bootstrap_stats method runs parametric bootstrap #57

Closed U-R-dev closed 6 months ago

U-R-dev commented 6 months ago

First of all, thank you for providing such a wonderful package. As stepmix is the only open source program that can perform LCA with distal outcomes, I've been using it for months now.

After upgrading from version 2.1.3 to 2.2.0, we found that nonparametric bootstrapping using bootstrap_stats took an enormous amount of time, about 10 times longer than before. Step-by-step debugging revealed that the variable "parametric" was set to "True" and, in fact, parametric bootstrapping was performed.

I could reproduce the above with the following simple sample code on the Google Colab.

!pip install stepmix ipdb
from stepmix.datasets import data_bakk_response
from stepmix.stepmix import StepMix

# Simulate data
X, Y, _ = data_bakk_response(n_samples=2000, sep_level=.9, random_state=42)

# Define base model
model = StepMix(n_components=3, n_steps=1, measurement='bernoulli',
                structural='gaussian_unit', random_state=42)
model.fit(X, Y)

# Bootstrap
breakpoint()  # for debugging
stats_dict = model.bootstrap_stats(X, Y, n_repetitions=5)

As shown in the debugging screenshot below,

Screenshot 2024-02-15 at 15 01 33 2

when the command reached the bootstrap function (line 1084), the variable "parametric" was set to "True" and remained "True" until the end of the process, resulting in conducting parametric bootstrapping.

I don't understand why default value, parametric=False, is ignored, but It would be helpful if you could look into this phenomenon.

Thank you,

sachaMorin commented 6 months ago

Thanks for reporting. I'll look into this.

sachaMorin commented 6 months ago

The progress_bar argument of self.bootstrap_stats is wrongly passed to self.bootstrap as the parametric argument because of positional arguments and the changes introduce in 2.2.0. Good catch! I'll release a patch today.

sachaMorin commented 6 months ago

Should be fixed as of 2.2.1. Also if by any chance you are in a position to share your work, please do! We are always curious to see how our users are using StepMix.

U-R-dev commented 6 months ago

Ah, I see. It's so basic, but I didn't notice it. Thank you for the quick response. We will be sure to cite your paper when our work is published!