HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

Putting two different mergings in the cluster codes. #312

Closed david-priest closed 1 year ago

david-priest commented 1 year ago

Say I cluster an sce with one set of type markers and save this clustering as a new column in the colData (merging1). Then I cluster the sce again with a different set of type markers and save this clustering in the colData (merging2). The second clustering will over-write the cluster_ids and cluster_codes. Is it possible to put both of the clustering back into the cluster codes?

It's relatively easy to subset an sce: sce2 <- sce[,sce$merging1 %in% "cell_type1"] and then use CATALYST functions to look at abundances of merging2 within cell_type1 of merging1. But I'm sorry, my knowledge runs out when it comes to being able to put both merging1 and merging2 into the cluster codes so that CATALYST functions can be used on both, not just merging2.

HelenaLC commented 1 year ago

Good question. Unfortunately, you are right in that cluster will overwrite the cluster_ids and clulster_codes, so the above is not straightforward to do. But, fortunately, there's a pretty simple workaround (see example below). In essence, i) cluster once; ii) keep track of cell identifiers and subset; iii) recluster; iv) write the new IDs into the original object using the cell IDs from ii). From what I understand, in your case, you would subset on merging1 and in the 2nd round extract from merging2.

> suppressPackageStartupMessages({
+     library(CATALYST)
+     library(patchwork)
+ })
> 
> # 1st round of clustering
> data(PBMC_fs, PBMC_panel, PBMC_md)
> sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
> sce <- cluster(sce, verbose = FALSE)
> 
> # track cell IDs, subset & re-cluster
> sce$cell_id <- seq_len(ncol(sce))
> sub <- filterSCE(sce, cluster_id %in% c(1,2,3), k = "meta6")
> sub <- cluster(sub, verbose = FALSE)
> 
> # put re-cluster IDs in original object
> sce$metaN[sub$cell_id] <- cluster_ids(sub, "meta8")
> sce$metaN <- factor(sce$metaN)
> 
> # some spot checks
> length(sub$cell_id) == sum(!is.na(sce$metaN))
[1] TRUE
> table(sce$metaN, cluster_ids(sce, "meta6"))

       1    2    3    4    5    6
  1 1379    2    2    0    0    0
  2  411    0    3    0    0    0
  3   25  137    8    0    0    0
  4   19  120    0    0    0    0
  5   22  171  201    0    0    0
  6    0  188    0    0    0    0
  7    0  197    0    0    0    0
  8    1  165    2    0    0    0
> 
> # visualization
> sce <- runDR(sce, cell = 200)
> plotDR(sce, color_by = "meta6") +
+ plotDR(sce, color_by = "metaN")

image