A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
Current categorical model one-hot encodes integer data every time we call the E or M step in the optimisation loop. We could probably obtain a meaningful speedup if we cached the one-hot encodings.
Current categorical model one-hot encodes integer data every time we call the E or M step in the optimisation loop. We could probably obtain a meaningful speedup if we cached the one-hot encodings.