emptyDrops / over-calling cells?

MarioniLab / DropletUtils

Clone of the Bioconductor repository for the DropletUtils package.

https://bioconductor.org/packages/devel/bioc/html/DropletUtils.html

56 stars 27 forks source link

emptyDrops / over-calling cells? #84

Closed bbimber closed 2 years ago

bbimber commented 2 years ago

Hello,

This question might not have a quick answer. We're running emptyDrops on 10x V2 input data, with what I think are fairly default params:

e.out <- DropletUtils::emptyDrops(seuratRawData, niters = 10000, lower = 100)

In the plot below, you can see where emptyDrops placed the threshold, which appears to be 100 (the default value of lower()), and is likely over-calling cells. Are there any parameters in emptyDrops() I could explore that might help it more automatically pick a better setting? Do you have any docs or guidance on tuning this?

I realize I can pick a manual threshold per dataset; however, I'm hoping to create a process we can execute automatically across all our 10x data.

Thanks for any ideas.

LTLA commented 2 years ago

I don't see anything in the plot that indicates overcalling. The red line is the fitted spline, it doesn't mark where the called cells are. Besides, emptyDrops doesn't call cells by placing a threshold on the total counts.

bbimber commented 2 years ago

Yes, I didnt give a great representation of the cells. This plot fixes marking the knee/inflection. I'm not sure how to best plot this, but in this dataset cells with ~100 UMIs are being called as cells. I would have expected the inflection point to be more in the 800-900 UMI count range; however, I appreciate there's sort of a two-hump shape to it. In this instance, the cellranger algorithm places the cutoff more around 800-900. Are there any plots that would be more informative in interpreting emptyDrops behavior here?

LTLA commented 2 years ago

If you are convinced that the second plateau does not correspond to real cells, you could turn up lower to, e.g., 500 or something. Or you could use ignore. Or you could use emptyDropsCellRanger, which makes some different assumptions about which barcodes to use to define the ambient solution; the behavior is similar to setting by.rank=.

bbimber commented 2 years ago

Thanks - this gave a few things for me to look into. in this instance, setting lower to 200 was enough (opposed to 100), and dropped the detected cells from 60K to more like 16K (which is about right). I didnt know empyDropsCellRanger existed, and this also performs pretty well here with default settings.