MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
39 stars 23 forks source link

sizeFactors for mixed patient samples #113

Closed flde closed 8 months ago

flde commented 11 months ago

Hello Scran-Team,

Many thanks for your great tools! I was wondering if you can give me a recomendation of how to use sizeFactors with different patient and time points (the number of cells per sample differes a lot due to therapy). Right now, I follow the approach from Luecken et al. https://www.nature.com/articles/s41592-021-01336-8 where they run sizeFactor on pooled samples from different sequencing runs. But that is only accounting for technical variablity if I understand correctly.

Do you have a receommendation what is best practice for mixed patients samples or different stimulation condition (infected / not infected)? Should I run sizeFactor on all data combined, per group, or per sample (which would not always be possible in my specific case due to low cell numbers).

Highly appreciate your feedback!

All the very best, Florian

LTLA commented 11 months ago

Which size factor calculation are you using?

If it's computeLibraryFactors, it doesn't matter much, as it's just using the library size. Nonetheless, I've found that downscaling to the lowest-coverage batch avoids a lot of difficult batch-to-batch heteroskedasticity and helps out the batch correction; at the cost of reducing signal, as information is implicitly discarded from the higher-coverage batches.

If it's computeSumFactors, I believe it would be better to run it within the stimulation condition, so as to avoid introducing strong DE within the pools. However, the best approach would be to run quickCluster beforehand so as to ensure that the pooling is only done within reasonably related cells - and in such cases, the clustering supercedes any experimental design considerations. (Group- or sample-specific batches is fine for the normalization step, as it's only used for scaling here.)

flde commented 8 months ago

Dear @LTLA,

sorry for my late reply and many thanks for the insights!

I used computeSumFactors. The normalized counts were used for dispersion based HVG selection and as Scanorama input. Overall, I was under the impression that the size factor normalization yields better results compared to other methods.

However, due to the complexity of the data set I later switched to SCVI. Also because Scanorama conserved too many patient specific effects even after thoughtful pre-processing.

That said, I am now running DEA and try to incorporate scran since normalization really makes a difference.

Best wishes, Florian