cluster orders in plotClusterExprs

amoyguang1 commented 1 year ago

Dear there. The package is really powerful for cyToF data. May i ask about a plotting question, please?

In plotClusterExprs, how is the cluster ranked in the plot? Is there a way to keep the same order in sce and its subest sce file? I am trying to plot one marker expression in 2 groups of the same sce file and compare its expression in all clusters. thanks a lot.

HelenaLC commented 1 year ago

clusters are ordered according to hierarchical clustering of (Euclidean) distances between median expression profiles (across the markers included in the plot).
you can overwrite the y-axis ordering manually, although I admit this is more complicated than it probably should be... minimal reproducible example below:

library(ggplot2)
library(CATALYST)

data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, verbose = FALSE)
(plt <- plotClusterExprs(sce, k = (k <- "meta8")))

# remote percentages from y-labels
# to match with original cluster IDs
o1 <- plt$data$cluster_id
o1 <- gsub("\\s\\(.*", "", o1)
plt$data$cluster_id <- o1

# reorder y-axis
o2 <- levels(cluster_ids(sce, k))
o2 <- c(o2, "average")
plt + scale_y_discrete(limits = o2)

amoyguang1 commented 1 year ago

Thanks a lot for the quick reply!

amoyguang1 commented 1 year ago

Hi Helena. May i ask whether it is possible to overlay the expression profiles from 2 conditions in the same graph? It would look better.

HelenaLC commented 1 year ago

Not with how the function is currently implemented, unfortunately. But here's a slightly different version that can do this (with some modifications to plotClusterExprs' code...):

library(ggplot2)
library(ggridges)
library(CATALYST)

data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, verbose = FALSE)

.plotClusterExprs <- function(x, k = "meta20", features = "type") {
    CATALYST:::.check_sce(x, TRUE)
    k <- CATALYST:::.check_k(x, k)
    x$cluster_id <- cluster_ids(x, k)
    features <- CATALYST:::.get_features(x, features)
    ms <- t(CATALYST:::.agg(x[features, ], "cluster_id", "median"))
    d <- dist(ms, method = "euclidean")
    o <- hclust(d, method = "average")$order
    cd <- colData(x)
    es <- assay(x[features, ], "exprs")
    df <- data.frame(t(es), cd, check.names = FALSE)
    df <- reshape2::melt(df, 
        id.vars = names(cd), 
        variable.name = "antigen", 
        value.name = "expression")
    df$avg <- "no"
    avg <- df
    avg$cluster_id <- "avg"
    avg$avg <- "yes"
    df <- rbind(df, avg)
    fq <- tabulate(x$cluster_id)/ncol(x)
    fq <- round(fq * 100, 2)
    names(fq) <- levels(x$cluster_id)
    df$cluster_id <- factor(df$cluster_id, 
        levels = rev(c("avg", levels(x$cluster_id)[o])), 
        labels = rev(c("average", paste0(names(fq), " (", fq, "%)")[o])))
    ggplot(df, aes_string(
        x = "expression", y = "cluster_id", col = "condition")) + 
        scale_color_manual(values = c("red", "blue")) +
        facet_wrap(~antigen, scales = "free_x", nrow = 2) + 
        geom_density_ridges(alpha = 0.2) + 
        theme_ridges() + theme(
            legend.position = "none", 
            strip.background = element_blank(), 
            strip.text = element_text(face = "bold"))
}

.plotClusterExprs(sce, k = "meta8")

amoyguang1 commented 1 year ago

Thanks a lot, Helena. It would be great to see more plotting function like this one in future versions. It is much powerful than commercial software Flowjo. However, the visualisation is a bit behind. Flowjo charges more than 200pound each year for 1 user :)

HelenaLC / CATALYST

cluster orders in plotClusterExprs #324