HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 30 forks source link

How to combine correctly SCE objects #374

Closed MargolinGaby closed 8 months ago

MargolinGaby commented 9 months ago

Dear Helena, following issue #373 I have an addition question. Just to make a short introduction to the process I want to accomplish: I clustered the main SCE and annotated it into 13 clusters by manual merging. then I took one of the clusters i.e. sce_monocytes by filtering it out of the main sce. (actually I did it for more than one cluster...) then I clustered sce_monocytes into sub-clusters, and cleaned out unknown cells. Now I want to return back the cleaned sce_monocytes object back to the main SCE, by using sce_new <- cbind(sce, sce_monocytes) I think I am loosing the identification of the cells as monocytes and under metadata of a new object there is an info for both objects (see attached) combine SCE objects

please advise how to do it correctly, so I can analyze the main clean SCE, thank you Gaby

HelenaLC commented 8 months ago
  1. regarding duplicate metadata: SingleCellExperiment's cbind method will do that (i.e., retain metadata from all objects), so that'a to be expected. You'll have to fix that (remove redundant information) manually, e.g., by setting metadata(sce)[[i]] <- NULL where i is the index you want to drop.
  2. unfortunately, I must admit this is a lot harder than it probably should be... the reason being that the reference cluster labels (1-100) used to match metaclusters won't be the same when re-clustering on a subset. I am providing a minimal example on how to achieve this below. Briefly, I am extracting a vector of cluster labels (across both objects) and assigning that to the joint object. Let me know if this does the trick in your case (if I understood it correctly).
# setup
library(CATALYST)
data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
colnames(sce) <- paste0("cell", seq_len(ncol(sce)))

# clustering & subclustering
sce <- cluster(sce, verbose=FALSE)
ks <- sample(cluster_ids(sce, k1 <- "meta10"), 5)
sub <- filterSCE(sce, k=k1, cluster_id %in% ks)
sub <- cluster(sub, verbose=FALSE)

# relabel subclusters for clarity
tbl <- data.frame(
    old=ks <- seq_len(20),
    new=paste0("x", ks))
sub <- mergeClusters(sub, 
    k="meta20", 
    table=tbl, 
    id=k2 <- "subclustering")

# get joint cluster labels
kids <- as.character(cluster_ids(sce, k1))
names(kids) <- colnames(sce)
kids[colnames(sub)] <- as.character(cluster_ids(sub, k2))
kids <- factor(kids)

# join objects
idx <- setdiff(colnames(sce), colnames(sub))
rowData(sub)$marker_class <- NULL
new <- cbind(sce[, idx], sub)

# set cluster labels
new$cluster_id <- kids
df <- data.frame(meta20=kids)
metadata(new)$cluster_codes <- df

plotExprHeatmap(new, by="cluster_id")

image