epigen / unsupervised_analysis

A general purpose Snakemake workflow to perform unsupervised analyses (dimensionality reduction & cluster analysis) and visualizations of high-dimensional data.
MIT License
20 stars 3 forks source link

provide PCA configurations, eg % of variance to keep, to speed up for large data #49

Closed sreichl closed 3 months ago

sreichl commented 4 months ago

breaking change due to additions to the config, but could significantly accelerate workflow for large data

# https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
pca_parameters:
    n_components: 0.9 # variance as float (0-1], number of components as int 50, or 'mle'
    svd_solver: 'auto' # options: ‘auto’, ‘full’, ‘covariance_eigh’, ‘arpack’, ‘randomized’