MaxHalford / prince

:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA
https://maxhalford.github.io/prince
MIT License
1.27k stars 184 forks source link

Transform with FAMD - Is it correct ? #72

Closed Melkaz closed 1 year ago

Melkaz commented 5 years ago

Hello,

I'd like to:

  1. Build a FAMD model from a dataframe
  2. Project a single record using this model

In the example below, I have a dataframe that I use to fit a model. I then pick a single row from the original dataframe and attempt to project it with the created FAMD model.

Why do I obtain different values between:

Is there something I missed ?

Thanks !

Running code here: https://repl.it/repls/WearyMajorNature

import prince
import pandas as pd

df = pd.DataFrame(
    {'variable_1': [4, 5, 6, 7, 11, 2, 52],
    'variable_2': [10, 20, 30, 40, 10, 74, 10],
    'variable_3': [100, 50, -30, -50, -19, -29, -20],
    'color': ['red', 'blue', 'green', 'blue', 'red', 'red', 'blue']
    })

model = prince.FAMD(
            n_components = df.shape[1],
            copy = True,
            check_input = True,
            engine = 'auto',
            random_state = 1
        ).fit(df)

print(model.row_coordinates(df))

# Let's say we want to transform a single row
row = pd.DataFrame(df.iloc[0]).transpose()

print(model.transform(row))

# Why is this transform very different than the projection of this same record in the first dataframe ?

print(model.row_coordinates(df).iloc[0])
MaxHalford commented 5 years ago

Nice catch... I don't have time to look at this as I'm on holiday for the next 3/4 weeks. I'll take a look as soon as I get back, if no one else has.

wayfarer91-tog commented 4 years ago

Hello.

I tried to run the same code as you and I got an error:

ValueError: shapes (1,4) and (6,4) not aligned: 4 (dim 1) != 6 (dim 0)

Any ideas why does this happen?

snthibaud commented 4 years ago

I think I ran into the same problem. I used the following code:

import pandas as pd
import numpy as np
from prince import FAMD

famd = FAMD(n_components=3)
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
df["A"] = df["A"].astype("category")
famd.fit(df)
print(famd.transform(df[0:5]))
print(famd.transform(df)[0:5])

When I ran it, the output was:

          0         1         2
0 -0.346965  0.016770 -0.020404
1  0.150889  0.043458  0.121991
2  0.162368 -0.073471 -0.066656
3 -1.506365  1.515455  0.937305
4  0.266976 -0.104443  0.016896
          0         1         2
0 -0.352857  0.029198 -0.059385
1  0.190614  0.034506  0.126858
2  0.272716 -0.154023 -0.174798
3 -6.194747  6.564436  4.242015
4  0.414774 -0.215533  0.112739
snthibaud commented 4 years ago

I also noticed that PCA and MCA both work well independently. Looking at the code, I think it might be related to this line -> https://github.com/MaxHalford/prince/blob/ba8a66b6575320832b118186745ecfd85c896bdc/prince/mfa.py#L98

The data is normalized before transforming there, but it should be normalized based on the fitted data.

MaxHalford commented 4 years ago

If this is still a bug can someone please produce a minimum working example?

snthibaud commented 4 years ago

@MaxHalford I think it does not get much more minimal than this:

import pandas as pd
from prince import FAMD

famd = FAMD(n_components=1)
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df["A"] = df["A"].astype("category")
famd.fit(df)
print(famd.transform(df[0:1]))
print(famd.transform(df)[0:1])

Output:

     0
0 -0.5
          0
0 -1.414214
MaxHalford commented 1 year ago

Hello there 👋

I apologise for not answering earlier. I was not maintaining Prince anymore. However, I have just refactored the entire codebase. This refactoring should have fixed many bugs.

I don’t have time and energy to check if this fixes your issue, but there is a good chance it does. Feel free to reopen this issue if the problem persists after installing the new version — that is, version 0.8.0 and onwards.