Closed tkapello closed 4 years ago
@tkapello Definitely raw. The pseudobulk counts get normalized in the downstream analysis, e.g., TMM normalization via edgeR::calcNormFactors
if pbDS(pb, method="edgeR") is used .. the other methods (limma-voom, DESeq2, limma-trend) also will apply a default normalization.
Thanks @markrobinsonuzh for the prompt answer. I would think that the cells have to be normalized before the aggregation to account for differential sequencing depth. Could you elaborate on your rationale so that I understand?
@tkapello we also wondered about whether normalizing first would help the inference on the aggregates, but the (simulation) results suggest that it's not that important. You can read all about this in the muscat paper (preprint): https://www.biorxiv.org/content/10.1101/713412v1 (Figure 2, in particular) .. compare edgeR.sum(scalecpm)
to edgeR.sum(counts)
.
More like a question rather than an issue. I am wondering if you have any good thoughts about whether raw or already normalized single-cell data should be used for preparing the pseudobulk samples with the AggregateData(). In the case of using raw counts (as is the default), would you recommend normalizing for downstream analysis?