Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
54 stars 4 forks source link

Printed metrics do not account for sample_weights #29

Closed sachaMorin closed 1 year ago

sachaMorin commented 1 year ago

The printed StepMix report currently does not account for sample_weights, even if provided.

This is a relatively easy fix. sample_weight should simply be passed to the report method.

However, I'm not sure how the LL (not averaged) should account for sample weights.

For reference, here's how the average LL is defined in the StepMix code:

np.average(ll, weights=sample_weight)

And here's the relevant lines in the StepMix output.

    ============================================================================
    Fit for 3 latent classes
    ============================================================================
    Estimation method             : 1-step
    Number of observations        : 150
    Number of latent classes      : 3
    Number of estimated parameters: 20
    Log-likelihood (LL)           : -626.6180
    -2LL                          : 1253.2361
    Average LL                    : -4.1775
    AIC                           : 1293.24
    BIC                           : 1353.45