Bioconductor / OrchestratingSingleCellAnalysis

Content for the OSCA Book.
http://bioconductor.org/books/devel/OSCA/
66 stars 36 forks source link

DEGs between two scRNA-seq datasets #62

Open kikegoni opened 2 years ago

kikegoni commented 2 years ago

Hey,

Sorry for asking here, don't know if this is the place to ask. First of all thanks a lot for this amazing book! OSCA is really helpful!!

I have a question regarding the section 3.4 OSCA Multisample.

So I have two SingleCellExperiment objects, let's say sce1 and sce2 from different batches and I want to compare the expression of a particular gene between both datasets. So I agree that using the corrected values obtained after applying MNN (or any other integration algorithm) could give results which do not reflect the true biology.

My concern is mostly about your suggestion which I am not sure if I fully understood when you suggested : "We suggest performing cross-batch comparisons on the original expression values wherever possible". I think that I should apply individually LogNormCounts to normalize sce1 cell effect and sce2 and then apply MultiBatchNorm to correct "between-batch effect" between sce1 and sce2. After that in the Violin plots and observe the graphical differences of a particular gene, but how should I statistically model these differences both for a particular gene and for all annotated genes?

Thanks a lot for any help you could provide about this,

Best,

Kike

LTLA commented 2 years ago

Comparisons across batches are most easily handled with a pseudo-bulk approach, which is described in Chapter 4. (That particular comment in 3.4 was meant to segue into the next chapter, but the link must have disappeared over time.)

I'll assume you've read that chapter, in which case I'll jump straight to your situation. If each of your batches correspond to a single sample, then you're in a pickle, because you don't have any replicates. (Keeping in mind that cells should be treated as experimental replicates.) If each batch contains multiple samples, then you're in a better position, because then you can treat it as a regular bulk RNA-seq experiment after aggregating counts within each sample. Of course, this assumes that you don't have any confounding batch effects between your two batches, but that would be a fundamental error in experimental design.

kikegoni commented 2 years ago

Yes, I read that chapter and I like it a lot, but as you said, each of my sce1 and sce2 consists of only one sample so I cannot apply the suggestions in chapter 4. So the approach I though about was to normalize each sample individually using LogNormCounts, then correct the batch effect within the samples with MultiBatchNorm (after selecting the universe of genes that are common between sce1 and sce2), then concatenate sce1 and sce2 and apply findMarkers between both batches. Would that be correct?

Again, thanks a lot for your help and for contributing to the scRNA-seq community.

Best,

Kike

LTLA commented 2 years ago

Would that be correct?

Well, not really. If you have n = 1 for both groups, then nothing is going to be correct. The considerations that go into designing an experiment for bulk RNA-seq are still applicable here, and no combination of logNormCounts, multiBatchNorm or any method will be able to get around the absence of any information about sample-to-sample variation in your dataset.

Of course, you can still follow your planned approach, but the magnitude of the p-values will be meaningless. Cells are not experimental replicates, and treating them as such would be inappropriate. The edgeR user's guide has some relevant comments about the scenario where you don't have any replicates in a bulk RNA-seq experiment; these are worth reading.

kikegoni commented 2 years ago

Thanks a lot for your response!! Yes, I understand that any statistics test is going to work for DE genes if n=1 in both groups. But for visually comparing the expression of a particular gene in both SCE datasets, is the multiBatchNorm approach the proper method to apply? I mean, if I just want to visually inspect for a particular the ViolinPlots of the corrected counts for sce1 and sce2.

Thanks a lot for your help!!

Best,

Kike

LTLA commented 2 years ago

I guess it's okay, if you just want to look at it.

kikegoni commented 2 years ago

Perfect! Understood! Thanks a lot for your help!!

Best,

Kike