A general purpose Snakemake workflow to perform unsupervised analyses (dimensionality reduction & cluster analysis) and visualizations of high-dimensional data.
MIT License
20
stars
3
forks
source link
provide PCA configurations, eg % of variance to keep, to speed up for large data #49
breaking change due to additions to the config, but could significantly accelerate workflow for large data
[x] check if any other parameters might be useful to the user
[x] adapt internal index calculation to take the complete precomputed PCA
[x] integrate parameters into file/folder naming scheme
[x] document both changes accordingly
# https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
pca_parameters:
n_components: 0.9 # variance as float (0-1], number of components as int 50, or 'mle'
svd_solver: 'auto' # options: ‘auto’, ‘full’, ‘covariance_eigh’, ‘arpack’, ‘randomized’
breaking change due to additions to the config, but could significantly accelerate workflow for large data