Raw or normalized counts for pseudobulk?

HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools

166 stars 33 forks source link

Raw or normalized counts for pseudobulk? #22

Closed tkapello closed 4 years ago

tkapello commented 4 years ago

More like a question rather than an issue. I am wondering if you have any good thoughts about whether raw or already normalized single-cell data should be used for preparing the pseudobulk samples with the AggregateData(). In the case of using raw counts (as is the default), would you recommend normalizing for downstream analysis?

markrobinsonuzh commented 4 years ago

@tkapello Definitely raw. The pseudobulk counts get normalized in the downstream analysis, e.g., TMM normalization via edgeR::calcNormFactors if pbDS(pb, method="edgeR") is used .. the other methods (limma-voom, DESeq2, limma-trend) also will apply a default normalization.

tkapello commented 4 years ago

Thanks @markrobinsonuzh for the prompt answer. I would think that the cells have to be normalized before the aggregation to account for differential sequencing depth. Could you elaborate on your rationale so that I understand?

markrobinsonuzh commented 4 years ago

@tkapello we also wondered about whether normalizing first would help the inference on the aggregates, but the (simulation) results suggest that it's not that important. You can read all about this in the muscat paper (preprint): https://www.biorxiv.org/content/10.1101/713412v1 (Figure 2, in particular) .. compare edgeR.sum(scalecpm) to edgeR.sum(counts).