HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

ratio between subsets #334

Closed france-hub closed 1 year ago

france-hub commented 1 year ago

Hello Helena, I have been asked to understand whether the ratio between two subsets is significantly enriched in one or more of the conditions in my dataset.

I tried to get the ratio using Catalyst, but I do not know if what I am doing is correct:

sce$cluster_annotation <- cluster_ids(sce, "cluster_annotation") #add cluster annotation 
p <- plotCounts(sce, group_by = "sample_id", color_by = "cluster_annotation")
df <- p$data 
df <- df %>% dplyr::filter(cluster_annotation %in% c("StL", "SL")) #StL and SL are the subsets of interest in cluster_annotation
df <- df %>% spread(cluster_annotation, value) %>% mutate(ratio = StL/SL)

Does this sound right?

Thanks! Francesco

HelenaLC commented 1 year ago

To compute the ratio you want, yes, that looks right. Though I'd use base R for this (simpler than plotting, pulling out data, and using fancy tidy stuff)...

> # setup from package example data
> library(CATALYST)
> data(PBMC_fs, PBMC_panel, PBMC_md)
> sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
> sce <- cluster(sce, verbose = FALSE)
> 
> # specify clustering of interest
> k <- "meta10" 
> kids <- cluster_ids(sce, k)
> # cell counts by sample-cluster
> ns <- table(sce$sample_id, kids) 
> # get proportions
> fq <- prop.table(ns, 1)
> fq[seq(6), seq(4)]
        kids
                   1           2           3           4
  Ref1   0.253121453 0.177071510 0.007945516 0.047673099
  Ref2   0.292237443 0.228310502 0.015981735 0.020547945
  Ref3   0.180303030 0.198484848 0.019696970 0.113636364
  Ref4   0.198945982 0.125164690 0.005270092 0.047430830
  BCRXL1 0.229166667 0.035984848 0.013257576 0.049242424
  BCRXL2 0.215037594 0.085714286 0.007518797 0.019548872
> # these sum to 1 across samples, i.e., 
> # relative cluster abudance across samples
> all(rowSums(fq) == 1)
[1] TRUE
> # ratio betwee clusters 3 & 8 
> fq[, "3"]/fq[, "8"]
      Ref1       Ref2       Ref3       Ref4     BCRXL1 
0.02916667 0.07692308 0.19402985 0.03846154 0.03608247 
    BCRXL2     BCRXL3     BCRXL4 
0.02272727 0.04819277 0.03053435 

But whether this is a sound approach to test "whether the ratio between two subsets is significantly enriched in one or more of the conditions" I am not sure.

The typical approach here would be differential abundance (DA) analysis across all subpopulations that also accounts for composition effects. E.g., if another subpopulation (not considered in the ratio) drastically changes (in abundance), this might affect other subpopulations in their relative abundance. Now, if the two subpopulations you're considering or different samples are affected differently, their ratio might change, though it's a side-effect of the "bigger picture".

I admit this is a vague and not at all a statsy argument, but my intuition is that one wants to model the whole data to draw any conclusions vs. pulling out a comparison of interest only. Perhaps @lmweber, author of the diffcyt package that implements DA analysis in this framework, has some better thoughts?

france-hub commented 1 year ago

Ok got it! Thanks for your answer. I’ll wait then for Weber’s opinion too.

france-hub commented 1 year ago

Hello Helena, Feel free to close this if you need to, you gave me a really satisfactory answer. If needed I'll open a diffcyt issue.

Thanks! Francesco