Closed sreichl closed 4 months ago
Two options
min(100,table(metadata$column))
Precompute
library(fastcluster)
library(ComplexHeatmap)
data <- matrix(rnorm(10000), nrow = 1000)
dist_matrix <- dist(data)
row_hc <- fastcluster::hclust(dist_matrix, method = "complete") col_hc <- fastcluster::hclust(dist(t(data)), method = "complete") Heatmap(data, cluster_rows = as.dendrogram(row_hc), cluster_columns = as.dendrogram(col_hc))
- [x] check where the most options in terms of `distance metrics` and `hierarchical clustering methods` are provided.
tried with fastdist but abandoned, due to Error when using metric correlation
. Anyway probably not as stable as scipy (although faster).
Traceback (most recent call last):
File "/research/home/sreichl/projects/unsupervised_analysis/.snakemake/scripts/tmpj83a3tj1.distance_matrix.py", line 43, in <module>
dist_mtx = fastdist.matrix_pairwise_distance(data_np, metric_function, metric, return_matrix=True)
ZeroDivisionError: division by zero
Observation downsampling: random with random seed to 1000?
Feature cut off by variability to 10k
Or both as configurable parameters? Sample_proportion highly_variable_feature_proportion
Downsampling done in distance matrix step
Heatmap script: filter data & metadata for downsampled observations/features
Both configs accept float 0-1 as proportion or int as the absolute number of observations/features to be downsampled to.
define too large: e.g., >10,000 samples/cells?
ideas
min(100,table(metadata$column))