A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
I have a question, why are you applying a SVD on a zero-mean (along the first axis) matrix? I'm not sure that's correct, I think you need to apply the SVD to the covariance matrix.
In fact, when checking the pca.explained_varianceratio, that corresponds only to component of the vector S returned by the SVD (normalized to the sum of elements in the vector) applied on a covariance matrix.
Substract the mean per dimension. The resulting data has mean Zero. This is done to avoid numerical difficulties
Divide each dimension by its standard deviation. Now, the data is unit-free, has a variance of 1 but leaves the correlations between dimensions intact.
In PCA, the principal components are the n-Eigenvectors with the largest Eigenvalues. This can be done either by:
Hands-On the covariance matrix:
Compute the data covariance matrix.
Compute the Eigenvalues and Eigenvectors of the covariance matrix.
Select the Eigenvectors with the largest Eigenvalues.
Via SVD, because it just computes and orders the desired Eigenvectors for you (as you already stated in your comment).
I have a question, why are you applying a SVD on a zero-mean (along the first axis) matrix? I'm not sure that's correct, I think you need to apply the SVD to the covariance matrix. In fact, when checking the pca.explained_varianceratio, that corresponds only to component of the vector S returned by the SVD (normalized to the sum of elements in the vector) applied on a covariance matrix.