Closed antoine4ucsd closed 3 years ago
1) cluster()
does a two-step clustering: i) FlowSOM
clustering into xdim
* ydim
clusters; ii) ConsensusClusterPlus
metaclustering into maxK
clusters. So, if you want to increase the final number of clusters to > 80, I'd suggest also increasing the grid size, e.g. xdim = ydim = 20
(= 400 clusters), then maxK = 80
should work.
2) Yes, easy... just do table(sce$sample_id, cluster_ids(sce, k = "meta10"))
(or specify any clustering you like for k
)
3) Yes, via downsampling: you could sample a fixed number of cells per sample or per cluster. For example,
# split cells by cluster
cells_by_cluster <- split(seq(ncol(sce)), cluster_ids(sce, k = "meta80"))
# keep at most 1k cells per cluster
cells_to_keep <- lapply(cells_by_cluster,
function(cs) sample(cs, min(length(cs), 1000)))
# subset & plot
sub <- sce[, unlist(cells_to_keep)]
plotClusterExprs(sub, ...)
Perfect! Thank you again
a
On Dec 10, 2020, at 1:12 AM, Helena L. Crowell notifications@github.com wrote:
cluster() does a two-step clustering: i) FlowSOM clustering into xdim * ydim clusters; ii) ConsensusClusterPlus metaclustering into maxK clusters. So, if you want to increase the final number of clusters to > 80, I'd suggest also increasing the grid size, e.g. xdim = ydim = 20 (= 400 clusters), then maxK = 80 should work.
Yes, easy... just do table(sce$sample_id, cluster_ids(sce, k = "meta10")) (or specify any clustering you like for k)
Yes, via downsampling: you could sample a fixed number of cells per sample or per cluster. For example,
split cells by cluster
cells_by_cluster <- split(seq(ncol(sce)), cluster_ids(sce, k = "meta10"))
keep at most 1k cells per cluster
cells_to_keep <- lapply(cells_by_cluster, function(cs) sample(cs, min(length(cs), 1000)))
subset & plot
sub <- sce[, unlist(cells_to_keep)] plotClusterExprs(sub, ...) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hello sorry for bothering (and thanks again for the previous tips_. I have 3 quick questions:
1. As compared to phenograph, CATALYST::cluster requires an upper limit for the number of clusters. is there a way to optimize this step. for example, I cannot go above 80 with the code below ( tried maxK=100) but phenograph is outputting 124 cluster with the same set of data. all suggestions are welcome!
CATALYST::cluster(sce, features = "type", xdim = 10, ydim = 10, maxK = 80, seed = 1234)
2. is there a quick shortcut to summarize the ncell by cluster AND by sample? (i.e. not only by sample)
3. is there a way to circumvent the following memory error
plotClusterExprs(sce_80, k = "meta80", features = "type") Error: vector memory exhausted (limit reached?)
thank you