Clarifications for clrDR function

nilseling commented 2 weeks ago

Hi @HelenaLC,

thanks again for providing this package. I now had a closer look at the clrDR function and I'm a bit confused reading the documentation. Looking at the code, when setting by = "sample_id", the computations look good to me and match the results of calling compositions::clr after setting base = exp(1). In this setting you basically compute the cluster frequencies per sample and perform the clr transform. You then transpose the matrix to use the scater functions for DR. However, when setting by = "cluster_id" the computations are still done as before but you don't transpose the matrix. I believe when by = "cluster_id" you would need to calculate the sample frequencies per cluster (ncs <- table(sce$cluster_id, sce$sample_id)) before clr transformation. Maybe I missed something? Also the equation in the docs should be updated to clr(sk) = log p(s,k) - ∑ log p(s,k) / K

HelenaLC commented 2 weeks ago

Hi! Thanks for bringing this up. I believe this was (although debatable) done intentionally. I'll try to explain: In both cases (by cluster/sample), the Q of interest is to compare cluster abundances across samples/conditions. So in either case, the CLR is computed to reflect sample compositions. The by in this case affects visualization only. Let me give an example: ... I guess the Q is whether it ever is of interest to ask how cluster composition (in terms of samples) changes, which sounds strange to me to begin with.

when by="cluster_id", PC1 = cluster sizes (e.g., 3,5 on the left are smallest, 7,8 on the right are largest), and PC2 = condition (e.g., 5 more frequent in Ref, 6 more frequent in BCRXL; see also loadings = arrows). -> this tells us which clusters are changing.
when by="sample_id", PCs reflect compositional changes (as is more intuitive, I guess; e.g., there's visible pairing of 1/2 and 3/4 also in the heatmap, and a split between Ref/BCRXL). -> this tells us which samples are changing.

library(CATALYST)
library(patchwork)
library(ComplexHeatmap)

data(PBMC_fs, PBMC_panel, PBMC_md)
sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
sce <- cluster(sce, verbose=FALSE)

wrap_elements(grid.grabExpr(draw(plotFreqHeatmap(sce, k=k <- "meta8")))) /
(clrDR(sce, by="cluster_id", k=k, label_by="cluster_id") | 
clrDR(sce, by="sample_id", k=k, label_by="sample_id"))

nilseling commented 2 weeks ago

Perfect, thanks for the explanation! Makes sense.

HelenaLC / CATALYST

Clarifications for clrDR function #410