esafak / mca

Multiple correspondence analysis
BSD 3-Clause "New" or "Revised" License
178 stars 73 forks source link

Functionality of fs_r_sup() #21

Open mpikoula opened 2 years ago

mpikoula commented 2 years ago

Hello and many thanks for this module!

I'd like to get the MCA components on new, unseen data (test set) and was going to use fs_r_sup() to do so. In order to verify that I would get something reasonable I tried running fs_r_sup() on the training set, expecting to get the same result as fs_r().

However, the result is in fact a scaled version of fs_r() - each column is multiplied by a factor and I can't figure out where it comes from or whether I should be expecting this. I reproduce this in your burgundies notebook example where X the original data matrix:

Input: mca_ben.fs_r(N=3)

Output array([[ 0.8617, 0.0786, -0.0213], [-0.7130, -0.1571, -0.0192], [-0.9221, 0.0786, -0.0051], [-0.8617, 0.0786, 0.0213], [ 0.9221, 0.0786, 0.0051], [ 0.7130, -0.1571, 0.0192]])

Input: mca_ben.fs_r_sup(X,N=3)

Output: array([[ 0.9510, 0.3162, -0.4301], [-0.7870, -0.6325, -0.3871], [-1.0177, 0.3162, -0.1026], [-0.9510, 0.3162, 0.4301], [ 1.0177, 0.3162, 0.1026], [ 0.7870, -0.6325, 0.3871]])

Equivalent to:

Input: mca_ind.fs_r_sup(X,N=3)

Output: array([[ 0.9510, 0.3162, -0.4301], [-0.7870, -0.6325, -0.3871], [-1.0177, 0.3162, -0.1026], [-0.9510, 0.3162, 0.4301], [ 1.0177, 0.3162, 0.1026], [ 0.7870, -0.6325, 0.3871]])

Any help appreciated!

esafak commented 2 years ago

Hello, Maria. Were you able to resolve this? I don't have time to look into it so if you know what went wrong and can suggest a fix, I would be glad to help.