AllenInstitute / transcriptomic_clustering

improved clustering pipeline for transcriptomic data
Other
4 stars 1 forks source link

Translate scrattch.hicat functionality into Python modules #6

Open wbwakeman opened 3 years ago

wbwakeman commented 3 years ago

The transcriptomics clustering functionality is currently bundled into a single R package called scrattch.hicat. To improve performance, reliability, maintainability and extensibility, we will translate the critical parts of this code into a series of Python modules that will run as a pipeline. The MVP will be an implementation of the functionality contained with cluster.R, especially iter_clust .

regression/normalization Select High Variance Genes Dimension Reduction Filter known modes Clustering Merging Hierarchical Sorting (UPGMA)

Two notes:

  1. We are focusing on the most used parameter settings. This means that the MVP does not include WGCNA dimensionality reduction, nor Leiden or Kmeans clustering, regression, or Hierarchical Sorting.
  2. While many of the steps use standard, out-of-the-box algorithms, cluster merging is heavily customized. We think it will take the most time to translate: 5-10 days.

image.png

wbwakeman commented 3 years ago

Where this fits in the big picture (TimD meeting 1/29/2021)

Massive matrix Service (IDK) has every cell we know about partitioned into datasets

is an input to :

Transcriptomics clustering (this project)

is an input to:

Taxonomy Cluster service (created recently by Platform team) has an organization of clusters into taxonomies