Labo-Lacourse / stepmix

A Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods.
https://stepmix.readthedocs.io/en/latest/index.html
MIT License
54 stars 4 forks source link

ValueError: Missmatch between shapes when fitting mixed features #17

Closed MostafaAbdelrashied closed 1 year ago

MostafaAbdelrashied commented 1 year ago

When fitting mixed features on a training set and trying to predict a testing set, the model through a ValueError indicates that operands could not be broadcast together with shapes that are different.

How to reproduce the error?

import numpy as np
import pandas as pd
from stepmix.utils import get_mixed_descriptor
from stepmix import StepMix

df = pd.DataFrame(
    {
        # continuous
        "A": np.random.normal(0, 1, 100),
        # categorical encoded as integers
        "B": np.random.choice([0, 1, 2, 3, 4, 5], 100),
        # binary
        "C": np.random.choice([0, 1], 100),
    }
)

mixed_data, mixed_descriptor = get_mixed_descriptor(
    dataframe=df,
    continuous=['A'],
    categorical=['B'],
    binary=['C'],
)

X_train, X_test = mixed_data[:80], mixed_data[80:]

model = StepMix(n_components=3, measurement=mixed_descriptor, verbose=1, random_state=123)

model.fit(X_train)

preds = model.predict(X_test)

What is expected? model.predict should cluster the testing dataset giving a shape of (20,)

What actually happens?

ValueError: operands could not be broadcast together with shapes (20,3) (80,3) (20,3)