broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
295 stars 54 forks source link

Mitochondrial genes are not removed for NucSeq in v0.3 #310

Open schmeing opened 11 months ago

schmeing commented 11 months ago

Hi,

For NucSeq data I would expect that mitochodrial genes are to a large extend caused by ambient RNA and in previous versions they have been removed well for most samples. However, in v0.3 this is not the case anymore. I understand that it is likely caused by the changes described in this issue: https://github.com/broadinstitute/CellBender/issues/271 And it is very much related to this issue: https://github.com/broadinstitute/CellBender/issues/277

I tried very different parameters, especially FPR, but I did not find a way to reduce the mitochondrial content. In fact it is increasing due to the correction and higher FPR (different colors) even increase this effect: image The x axis is the percent mitochondrial without CellBender ambient removal and the y axis after CellBender ambient removal.

The knee plot for this sample looks the following (It is just an example, I have the issue for all samples): image The solid red line is where I applied the low count threshold. The dotted red line is where I put the total_droplets_included and the dashed red line is where I put the expected cell count, but I also tried without setting those parameters (and different settings for them). The solid blue line is the cellranger count estimate.

Keeping the log10(rank) of UMI counts on the x and plotting the average mitochondrial percent on the y we get: image We see here that we have elevated levels of mitochondrial genes in the ambient plateau, which gives me even more confidence that the mitochondrial genes should be removed.

In other samples with higher mitochondrial percent it is visible even better: image

Otherwise the run looks good: image image

Is there anything I can still try? I understand that getting false positives in your DGE due to the correction is bad, but in our case we have very different cell type composition between conditions and thus also get a lot of FPs when we do not correct (enough).

Thank you and best regards, Stephan

schmeing commented 8 months ago

Just to add here a plot for my last statement why CellBender v0.3 is not useable for our samples (and most likely many other). Here I plot the percent of B cells in the sample (x) vs. percent CD4Tcells expressing the given b-cell gene:

image

We clearly see how v0.3 does not remove anything, while v0.2 does removes a lot of the ambient, especially the strong correlation between b cells in sample and bcell marker expression in other cell types, which would cause false differently expressed genes in groups that are roughly by b cell fraction in sample.