CHAPTER 08-PCA FROM SCRATCH

ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Apache License 2.0

27.76k stars 12.73k forks source link

PCA preparation consists of two steps:

Substract the mean per dimension. The resulting data has mean Zero. This is done to avoid numerical difficulties
Divide each dimension by its standard deviation. Now, the data is unit-free, has a variance of 1 but leaves the correlations between dimensions intact.

In PCA, the principal components are the n-Eigenvectors with the largest Eigenvalues. This can be done either by:

Hands-On the covariance matrix:
- Compute the data covariance matrix.
- Compute the Eigenvalues and Eigenvectors of the covariance matrix.
- Select the Eigenvectors with the largest Eigenvalues.
Via SVD, because it just computes and orders the desired Eigenvectors for you (as you already stated in your comment).

ageron / handson-ml2

CHAPTER 08-PCA FROM SCRATCH #243