Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
57 stars 4 forks source link

Covariate simulation #7

Closed sachaMorin closed 1 year ago

sachaMorin commented 2 years ago

Simulations from Bakk 2018 can be reproduced by running python scripts/run_bakk_simulation.py. While the response simulation works as expected, the covariate one does not. Maybe this is a hyperparameter issue? Or perhaps how we defined the simulated data in stepmix/datasets.py?

The current output of python3 scripts/run_bakk_simulation.py -c -s 10 (covariate, 10 simulations) is :

variable                       Bias                                                  RMSE                                               
Model                        1-step 2-step 3-step (Naive) 3-step (BCH) 3-step (ML) 1-step 2-step 3-step (Naive) 3-step (BCH) 3-step (ML)
Class Separation Sample Size                                                                                                            
0.7              500          15.11   1.13          -0.51         0.17        2.68  15.23   2.75           0.75         0.53        6.85
                 1000         17.41   0.87          -0.37         0.58        0.70  17.49   1.45           0.55         0.99        1.32
                 2000         16.64   1.06          -0.47         0.99        1.49  16.74   1.29           0.52         1.24        2.16
0.8              500          21.61   1.48           0.32         1.43        1.51  21.66   1.56           0.45         1.51        1.61
                 1000         21.11   1.17           0.20         1.33        1.24  21.26   1.22           0.31         1.47        1.34
                 2000         21.64   1.26           0.20         1.25        1.29  21.66   1.30           0.24         1.29        1.34
0.9              500           3.08   1.58           1.12         1.60        1.62   3.36   1.69           1.18         1.67        1.71
                 1000          3.35   1.42           0.98         1.42        1.39   3.56   1.44           1.01         1.45        1.41
                 2000          1.49   1.30           0.90         1.32        1.30   1.75   1.31           0.91         1.33        1.31