QuantEcon / lecture-python.myst

Quantitative Economics with Python
https://python.quantecon.org
80 stars 41 forks source link

PCA description in SVD lecture #396

Open sidd3888 opened 2 months ago

sidd3888 commented 2 months ago

Hey QuantEcon folks!

I was going through the SVD lecture (lecture number 5) and came across the section on PCA (5.8). Having re-read the section a couple of times, I see that the description of the whole process is a bit messy. The data is presented as $X$ an $m \times n$ matrix, with $m$ variables and $n$ individuals.

First, I believe that the text seeks to describe computing averages by variable and not individual, which is what the notation describes instead. The averages are computed and the average matrix $\bar{X}$ is written as a column vector of ones multiplied by $[\bar{X}_1 \cdots \bar{X}_n]$.

Thereafter, the section on decomposing the covariance matrix uses a $B^TB$ (instead of $BB^T$) operation, which I believe would result in an $n \times n$ matrix, as opposed to the desired $m \times m$. Furthermore, the description of the decomposition includes a section on the covariance matrix $C$ potentially not being diagonalizable, though it must be positive.

There might also be a typo in the last score matrix $T$.

(The covariance operation also uses $\frac{1}{n}$ instead of the sample version of $\frac{1}{n-1}$, but I was not sure of the intent there, so didn't know which it was meant to be)

jstac commented 2 months ago

Thanks @sidd3888 . This is much appreciated.

@thomassargent30 , perhaps @HumphreyYang could review these comments?

sidd3888 commented 2 months ago

I had made these changes locally already. If you wish, you can use some of that. Of course, if you want to completely restructure the section, upto you. @HumphreyYang @jstac

HumphreyYang commented 2 months ago

Many thanks to @sidd3888 for opening the PR.

I think there are some inconsistencies across sections and your edits look good to me. I will pass them on to @thomassargent30.