Closed nchelaru closed 1 year ago
Hey! I'll look into this as soon as I can.
Difference in results on IRIS dataset (including the target variable as part of PCA):
With R:
Call:
FAMD(base = iris, ncp = 3)
Eigenvalues
Dim.1 Dim.2 Dim.3
Variance 3.870 1.342 0.592
% of var. 64.503 22.370 9.862
Cumulative % of var. 64.503 86.873 96.735
I tried the same with Prince without normalization:
X,y = load_iris(return_X_y=True)
X = pd.DataFrame(np.hstack([X,y.reshape(-1,1)]))
X.iloc[:,-1] = X.iloc[:,-1].astype("str")
famd = prince.FAMD(n_components=3,n_iter=10)
famd.fit(X)
print(famd.explained_inertia_)
[0.33736997081392395, 0.3314854899776377, 0.33114453920843834]
This looks like it is a data normalization/scaling issue with all principal components essentially depicting the three levels of the categorical target variable ("species"), i.e. no proportion of variance is explained by the continuous features!
To confirm this, if one sets the arguments rescale_with_mean=True
and rescale_with_std=True
in the super().__init__
method of mfa.py
, i.e. in the global pca - the results look better:
print(famd.explained_inertia_)
[0.6412568596845379, 0.2592472748780183, 0.09949586543744376]
What do you think is going on @MaxHalford ?
Hello there 👋
I apologise for not answering earlier. I was not maintaining Prince anymore. However, I have just refactored the entire codebase. This refactoring should have fixed many bugs.
I don’t have time and energy to check if this fixes your issue, but there is a good chance it does. Feel free to reopen this issue if the problem persists after installing the new version — that is, version 0.8.0 and onwards.
Hello!
First of all, great job on the package! :)
I'm just starting to learn about FAMD, and have been trying to do the analysis in both R and Python. Strangely, while I am getting identical results on a dataset using the two R packages available for FAMD,
PCAmixdata
andFactoMineR
, I am getting quite different results in terms of the eigenvalues withprince
. I think I must be accessing the wrong attribute to get the eigenvalues, as more downstream analyses done usingprince
does give the same results as the two R packages.For example, this is code that I am using with
FactoMineR
:And these are the results I am getting:
With
prince
:I am getting very different numbers:
I'm sure that I am just calling the wrong thing, but I can't seem to find what I should be using to get the same results as
FactoMineR
.Any help will be greatly appreciated! :)