Translate scrattch.hicat functionality into Python modules

The transcriptomics clustering functionality is currently bundled into a single R package called scrattch.hicat. To improve performance, reliability, maintainability and extensibility, we will translate the critical parts of this code into a series of Python modules that will run as a pipeline. The MVP will be an implementation of the functionality contained with cluster.R, especially iter_clust .

regression/normalization Select High Variance Genes Dimension Reduction Filter known modes Clustering Merging Hierarchical Sorting (UPGMA)

Two notes:

We are focusing on the most used parameter settings. This means that the MVP does not include WGCNA dimensionality reduction, nor Leiden or Kmeans clustering, regression, or Hierarchical Sorting.
While many of the steps use standard, out-of-the-box algorithms, cluster merging is heavily customized. We think it will take the most time to translate: 5-10 days.

AllenInstitute / transcriptomic_clustering

Translate scrattch.hicat functionality into Python modules #6