provide PCA configurations, eg % of variance to keep, to speed up for large data - Githubissues

epigen / unsupervised_analysis

A general purpose Snakemake workflow to perform unsupervised analyses (dimensionality reduction & cluster analysis) and visualizations of high-dimensional data.

MIT License

20 stars 3 forks source link

provide PCA configurations, eg % of variance to keep, to speed up for large data #49

Closed sreichl closed 3 months ago

sreichl commented 4 months ago

breaking change due to additions to the config, but could significantly accelerate workflow for large data

[x] check if any other parameters might be useful to the user
[x] adapt internal index calculation to take the complete precomputed PCA
[x] integrate parameters into file/folder naming scheme
[x] document both changes accordingly

# https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
pca_parameters:
    n_components: 0.9 # variance as float (0-1], number of components as int 50, or 'mle'
    svd_solver: 'auto' # options: ‘auto’, ‘full’, ‘covariance_eigh’, ‘arpack’, ‘randomized’