Closed nilseling closed 2 weeks ago
Hi! Thanks for bringing this up. I believe this was (although debatable) done intentionally. I'll try to explain: In both cases (by cluster/sample), the Q of interest is to compare cluster abundances across samples/conditions. So in either case, the CLR is computed to reflect sample compositions. The by
in this case affects visualization only. Let me give an example: ... I guess the Q is whether it ever is of interest to ask how cluster composition (in terms of samples) changes, which sounds strange to me to begin with.
by="cluster_id"
, PC1 = cluster sizes (e.g., 3,5 on the left are smallest, 7,8 on the right are largest), and PC2 = condition (e.g., 5 more frequent in Ref, 6 more frequent in BCRXL; see also loadings = arrows). -> this tells us which clusters are changing.by="sample_id"
, PCs reflect compositional changes (as is more intuitive, I guess; e.g., there's visible pairing of 1/2 and 3/4 also in the heatmap, and a split between Ref/BCRXL). -> this tells us which samples are changing.library(CATALYST)
library(patchwork)
library(ComplexHeatmap)
data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, verbose=FALSE)
wrap_elements(grid.grabExpr(draw(plotFreqHeatmap(sce, k=k <- "meta8")))) /
(clrDR(sce, by="cluster_id", k=k, label_by="cluster_id") |
clrDR(sce, by="sample_id", k=k, label_by="sample_id"))
Perfect, thanks for the explanation! Makes sense.
Hi @HelenaLC,
thanks again for providing this package. I now had a closer look at the
clrDR
function and I'm a bit confused reading the documentation. Looking at the code, when settingby = "sample_id"
, the computations look good to me and match the results of callingcompositions::clr
after settingbase = exp(1)
. In this setting you basically compute the cluster frequencies per sample and perform the clr transform. You then transpose the matrix to use thescater
functions for DR. However, when settingby = "cluster_id"
the computations are still done as before but you don't transpose the matrix. I believe whenby = "cluster_id"
you would need to calculate the sample frequencies per cluster (ncs <- table(sce$cluster_id, sce$sample_id)
) before clr transformation. Maybe I missed something? Also the equation in the docs should be updated toclr(sk) = log p(s,k) - ∑ log p(s,k) / K