Closed jolespin closed 4 years ago
Dear Jolespin,
Thank you for using the pca library! This version of pca can handle one hot data (with SparsePCA) and sparse matrixes by the truncated SVD integration. What kind of data distribution (and metric) did you had in mind? Methods like MDS, LDA, SVD, UMAP can be helpful too for dimensionality reduction.
Verstuurd vanaf mijn iPhone
Op 20 jun. 2020 om 22:16 heeft Josh L. Espinoza notifications@github.com het volgende geschreven: Do you have plans to generalize these methods to principal coordinates analysis so we can use non-Euclidean distance? That would be absolutely incredible and would use this for all of my projects.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I used custom distances as input. It would be cool to visualize the loadings of these. For example,
df_dism = # Custom distance matrix (n,n)
Then I use df_dism
as input into PCoA
. If I did Euclidean as a distance function then this is the same as PCA. Let me know if you want me to elaborate. The docs on that link help explain a little better. The problem with MDS in sklearn is that it's stochastic and uses a random seed.
There is a really useful package for bioinformatics called scikit-bio
. They have PCoA
methods in them and a way to calculate loadings from these objects. I did a quick and dirty adaptation of your biplot
plotting function to use custom loadings. It's all detailed here: https://github.com/biocore/scikit-bio/issues/1710
I guess PCoA might be out of scope for your pca project but I was curious on how biplots work and a combination of your source code and the issue above was very helpful for me to understand.
I'm going to close this unless you think otherwise. Cheers
Do you have plans to generalize these methods to principal coordinates analysis so we can use non-Euclidean distance? That would be absolutely incredible and would use this for all of my projects.