Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
54 stars 4 forks source link

Support for count/poisson data? #59

Closed JacobElder closed 4 months ago

JacobElder commented 5 months ago

Hi,

I see that the package provides support for binary, categorical, and continuous data. Is there any support for count data? If there is not any direct support for count data in StepMix, is there a best solution to handling of count data using StepMix?

sachaMorin commented 5 months ago

There's unfortunately no native support for poisson models. StepMix was designed to support new distributions relatively easily so we could in principle support this if we find someone to help with the implementation.

Alternatively you could create bins for your count data and treat it as a categorical measurement. The last bin would represent the tail of your measurements (e.g., all counts above 10). Let me tag @ericlacourse and @FelixLaliberte to see if they have additional comments.

FelixLaliberte commented 5 months ago

Hi,

Thank you for your question. I would also suggest treating count data as a categorical indicators. This method, while not perfect, provides a workable solution until Poisson distribution is implemented in StepMix.

JacobElder commented 4 months ago

Thank you @sachaMorin and @FelixLaliberte for the suggestion. I was considering treating count as categorical but was concerned about the number of levels, but I like the idea of the last level/bin denoting the tail of measurements. I'll implement that!

Thanks for the work on the package. Great package.