fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

PCA centering #239

Closed bvenn closed 1 year ago

bvenn commented 1 year ago

Often data matrices must be centered prior to PCA analysis. When a gene expression matrix is considered, the intensities for an individual gene should have zero mean and unit variance. I think one should use the population standard deviation instead of the sample standard deviation:

https://github.com/fslaborg/FSharp.Stats/blob/3aa4c4ce5768e6e1e49d45efd6d2de5e1562e319/src/FSharp.Stats/ML/Unsupervised/PrincipalComponentAnalysis.fs#L41

Note: The result does NOT change, just the scaling is a little different. But for the sake of completeness I suggest to update it. What do you think @ZimmerD

image

ZimmerD commented 1 year ago

Completely agree!