Open ND-edward opened 2 years ago
Hi @ND-edward the n_components
can only be a positive integer as the aim of the PCA is dimensionality and multicolinearity reduction. So you practically reduce x number of features to y where y < x. Using this approach you should not get explained variance ration > 1.
@alfredsasko So if I get a dataset with 4 features, I should set n_components
be less than or equal to 3 instead of using the positive integer, 4, and the decimal 0.9, 0.8, or else?
@alfredsasko I still get the cumsum of explained variance ration > 1. I tried to set the n_components = 3
in a 4 features dataset, hence the code is varimax_pca = CustomPCA(n_components=3, rotation='varimax', random_state = 9527)
.
And the result of running the explained variance ratio as below:
pca_var_ratio = varimax_pca.fit(Z).explained_variance_ratio_
print(pca_var_ratio)
>[0.56433356 0.51035311 0.03468567]
print(pca_var_ratio.cumsum())
>[0.56433356 1.07468667 1.10937234]
The cumsum is even 1.1, and it seems does not make any sense
@ND-edward check this blog variance ratio van be >1 if varimax rotation is aplied. https://www.researchgate.net/post/Why-doesnt-SPSS-show-the-of-variance-after-rotation
@alfredsasko Is it because the factor is overlapped after the rotation and becomes correlated? As a result, the sum of explained variance will be > 1.
And what should I do to present the explained variance ratio of each component after the rotation? Would it be good that use the normal PCA (no rotation) to find the explained variance ratio given that the explained ratio should be the same whether rotated or not rotated.
For example, I used the rotated PCA to discover what features contribute to each component the most first like feature a contributes to the 1st component the most by the coefficient. Then, using the non-rotated PCA to find the explained variance ratio, say 60%, of the 1st component. Hence, we can conclude that the feature explains 60% of the variance.
Would it work?
@ND-edward indeed there might be a bug as varimax rotation is orthogonal and the sum of eigenvalues should not be > 1. So I advise you to do the same as you proposed. Run normal PCA log the % of explained variance and run Varimax rotation to explain factors. Watch out that the equality rule is valid only for a sum of eigenvalues, not individual values as they change with Varimax rotation. Feel free to look at the bug and submit the change request.
I would like to perform PCA with n_components = 0.9
So I first use StandardScaler() from sklearn to standardize the values and getting the below values:
And I used the from
advanced_pca import CustomPCA
to perform PCA with varimax rotation. Below is the code:varimax_pca = CustomPCA(n_components=n_components, rotation='varimax', random_state = 9527)
However, I found something strange that the cumsum of explained_variance_ratio is greater than 1
Is there any bug? Is it normal that the cumsum of explained variance ratio can be greater than 1? Thanks!