blab / pathogen-embed

Create reduced dimension embeddings for pathogen sequences
https://pypi.org/project/pathogen-embed/
MIT License
1 stars 0 forks source link

Automatically pick SVD algorithm for PCA from data #31

Closed huddlej closed 1 month ago

huddlej commented 1 month ago

Let scikit-learn automatically pick the SVD algorithm to use for PCA based on the given data instead of hardcoding the "full" solver. Fixes a bug that can occur with random virus datasets where the full SVD solver cannot converge but the randomized solver can. The scikit-learn PCA class picks from the full, arpack, and randomized solvers based on the shape of the input data matrix [1] which makes PCA more robust to these kinds of convergence issues.

[1] https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#pca