HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 31 forks source link

cluster num and ncell #167

Closed antoine4ucsd closed 3 years ago

antoine4ucsd commented 3 years ago

Hello sorry for bothering (and thanks again for the previous tips_. I have 3 quick questions:

1. As compared to phenograph, CATALYST::cluster requires an upper limit for the number of clusters. is there a way to optimize this step. for example, I cannot go above 80 with the code below ( tried maxK=100) but phenograph is outputting 124 cluster with the same set of data. all suggestions are welcome!

CATALYST::cluster(sce, features = "type", xdim = 10, ydim = 10, maxK = 80, seed = 1234)

2. is there a quick shortcut to summarize the ncell by cluster AND by sample? (i.e. not only by sample)

3. is there a way to circumvent the following memory error

plotClusterExprs(sce_80, k = "meta80", features = "type") Error: vector memory exhausted (limit reached?)

thank you

HelenaLC commented 3 years ago

1) cluster() does a two-step clustering: i) FlowSOM clustering into xdim * ydim clusters; ii) ConsensusClusterPlus metaclustering into maxK clusters. So, if you want to increase the final number of clusters to > 80, I'd suggest also increasing the grid size, e.g. xdim = ydim = 20 (= 400 clusters), then maxK = 80 should work.

2) Yes, easy... just do table(sce$sample_id, cluster_ids(sce, k = "meta10")) (or specify any clustering you like for k)

3) Yes, via downsampling: you could sample a fixed number of cells per sample or per cluster. For example,

# split cells by cluster
cells_by_cluster <- split(seq(ncol(sce)), cluster_ids(sce, k = "meta80"))

# keep at most 1k cells per cluster
cells_to_keep <- lapply(cells_by_cluster, 
    function(cs) sample(cs, min(length(cs), 1000)))

# subset & plot
sub <- sce[, unlist(cells_to_keep)]
plotClusterExprs(sub, ...)
antoine4ucsd commented 3 years ago

Perfect! Thank you again

a

On Dec 10, 2020, at 1:12 AM, Helena L. Crowell notifications@github.com wrote:

 cluster() does a two-step clustering: i) FlowSOM clustering into xdim * ydim clusters; ii) ConsensusClusterPlus metaclustering into maxK clusters. So, if you want to increase the final number of clusters to > 80, I'd suggest also increasing the grid size, e.g. xdim = ydim = 20 (= 400 clusters), then maxK = 80 should work.

Yes, easy... just do table(sce$sample_id, cluster_ids(sce, k = "meta10")) (or specify any clustering you like for k)

Yes, via downsampling: you could sample a fixed number of cells per sample or per cluster. For example,

split cells by cluster

cells_by_cluster <- split(seq(ncol(sce)), cluster_ids(sce, k = "meta10"))

keep at most 1k cells per cluster

cells_to_keep <- lapply(cells_by_cluster, function(cs) sample(cs, min(length(cs), 1000)))

subset & plot

sub <- sce[, unlist(cells_to_keep)] plotClusterExprs(sub, ...) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.