Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
54 stars 4 forks source link

Availability of "beta" for models with distal outcome #42

Closed FelixLaliberte closed 1 year ago

FelixLaliberte commented 1 year ago

Hi,

Is it possible to obtain parameters in the form of logistic regression when predicting a binary (or categorical) distal outcome? It appears that the "beta" parameters are not available for models with a distal outcome. If not, is there an alternative approach to extract these parameters?

Here's an example using a binary distal outcome:

import pandas as pd
import numpy as np
from stepmix.stepmix import StepMix
from sklearn.datasets import load_iris
from stepmix.utils import identify_coef

data, target = load_iris(return_X_y=True, as_frame=True)

for c in data:
  c_binary = c.replace("cm", "binary")
  data[c_binary] = pd.qcut(data[c], q=2).cat.codes

X = pd.DataFrame(data, columns=['sepal width (binary)', 
                                        'petal length (binary)', 
                                        'petal width (binary)'])

sepal_length_Binary = data['sepal length (binary)']

model = StepMix(n_components=3, 
                measurement='binary', 
                structural='binary', 
                n_steps=3,
                random_state=123)
model.fit(X, sepal_length_Binary)

model.get_parameters()['structural']['beta']

The code runs without any errors, but when I try to access the "beta" parameters, it returns an empty result.

Thank you!

sachaMorin commented 1 year ago

Yes. The key to use is "pis" (model.get_parameters()['structural']['pis']). Remember you can always look at the available keys of a dictionary when in doubt (model.get_parameters()['structural'].keys()).

sachaMorin commented 1 year ago

You can also refer to this tutorial.

sachaMorin commented 1 year ago

In case you are looking for the betas as in the "linear coefficients", please note that they are exclusive to the covariate model or more generally logistic regressions. Binary and categorical models are not regression models.

Binary and categorical parameters ("pis") instead represent conditional probabilities. This is the standard way of parametrizing Bernoulli or Multinoulli random variables. I agree the documentation regarding this could be clearer. It is explained at page 22 of the StepMix preprint, if that can help.

sachaMorin commented 1 year ago

In the binary case for example, pis is a K x D matrix representing conditional probabilities over K latent classes and D possible outcomes.

pis[k, d] gives you the probability that variable d = 1 given that your are in latent class is k.

FelixLaliberte commented 1 year ago

Thank you for all the information.

Therefore, I will rephrase my question as follows:

Is it possible to transform the conditional probabilities (model.get_parameters()['structural']['pis']) obtained with StepMix into coefficients of a "classic" logistic regression?

To clarify, I aim to compare the results of an LCA model with binary distal outcome (presented in a published article) with the results obtained using StepMix. The authors used a "naive" three-step approach:

1) They presented an LCA model obtained with another software/package (step-1).

2) Then, they created a new variable by recording the class membership without considering the uncertainty of class assignment (step-2).

3) Using the newly created variable, they predicted a binary distal outcome through a logistic regression (step-3) and presented a table of the logistic regression results (coefficients, p-values, etc.).

Therefore, is it possible to transform the conditional probabilities into coefficients to compare the coefficients from this article with the coefficients obtained using the different approaches offered by StepMix ("naive" three-step, two-step, etc.)?

In other words, I wish to obtain the following equation (with the first class as the reference category):

log(p/1-p) = β0 + (β1∗class2) + (β2∗class3) + ... + (βk∗classk)

Thank you for your time and assistance!

sachaMorin commented 1 year ago

The 3-step approach you're describing here is not currently implemented in StepMix.

I don't know of a straightforward way to convert binary mixture parameters to logistic regression parameters. @robinlegault any idea?

sachaMorin commented 1 year ago

You could always do the whole thing yourself however, using StepMix and some logistic regression estimator. Let me give you an example.

sachaMorin commented 1 year ago

I think this does what you are describing.

import pandas as pd
from stepmix.stepmix import StepMix
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Generate binary iris dataset
data, target = load_iris(return_X_y=True, as_frame=True)

for c in data:
  c_binary = c.replace("cm", "binary")
  data[c_binary] = pd.qcut(data[c], q=2).cat.codes

X = pd.DataFrame(data, columns=['sepal width (binary)', 
                                        'petal length (binary)', 
                                        'petal width (binary)'])

distal_outcome = data['sepal length (binary)']

# Step 1: Measurement Model
mm = StepMix(n_components=3,
                measurement='binary',
                n_steps=3,
                random_state=123)
mm.fit(X)

# Step 2: Assignments
assignments = mm.predict_proba(X)

# Step 3: Logistic Structural Model
sm = LogisticRegression(fit_intercept=True, random_state=123).fit(assignments, distal_outcome)
print(sm.coef_)
FelixLaliberte commented 1 year ago

Thank you. It answers my question. I was trying to perform the analysis without using another package.