SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
171 stars 19 forks source link

Use singleR to detect doublets ? #131

Closed akramdi closed 4 years ago

akramdi commented 4 years ago

Hello,

I was wondering if singleR can be used to detect doublets in a single cell experiment.

SingleR returns the best label from a reference, is there a way we could get the second best label for a given cell to confirm a suspicion of a doublet (suspicion based on observed genes co-expression indicating the presence of two cell types under the same barcode). More broadly, how can we take advantage of singleR to get a feeling if a cell is potentially a doublet ?

This may not be what singleR was meant for but I'd love to hear your thoughts about this.

Thanks a lot, Amira

LTLA commented 4 years ago

It's possible to get the second-best label from digging around in the scores matrix of the output. However, there is no guarantee that it will give you anything meaningful with respect to doublets. I can already think of a few counterexamples off the top of my head:

Rather, I would suggest you use dedicated tools for removing doublets. A survey of some approaches is available in the OSCA book; similarly, you can also use scDblFinder for doublet simulation.

akramdi commented 4 years ago

Thank you, I really appreciate your detailed response.

I actually tried a couple of tools dedicated to doublet detection (DoubletFinder, Scrublet) with which I'm able to remove some but not all doublets. This is why I thought of singleR to help me detect/remove the remaining doublets. I didn't know about scDbIFinder, I'll give it a go.

In my case, my doublet suspicion is very precise and I'm looking to confirm/refute it with singleR. I think I have doublets made up of noradrenergic cells (tumor cells, they make up the majority of the sample) and normal cells from the microenvironnement. I digged around the scores matrix to get a feeling and I'm getting interesting results:

Does this way of exploring/interpreting the results make sens?

You've mentioned interesting points about the score threshold to consider and I'm also thinking that the results might be influenced by the diversity of the chosen reference (I am working with HumanPrimaryCellAtlasData()).

LTLA commented 4 years ago

Does this way of exploring/interpreting the results make sens?

Maybe. As I said before, I could see how it could work, but I could also see how it might not work, and so it's hard to say. I think you would be better off using dedicated doublet detection approaches.

If you've got good enough clusters that your doublets fall into their separate cluster, consider using scran::doubletCluster; this will assemble evidence that a cluster does not consist of doublets of two other populations, and if you don't have strong evidence, well, it's probably doublets. This is a lot easier to interpret and has fewer assumptions than the simulation-based methods, but it assumes that you have reasonable clusters that distinguish doublets and their parents.

(Incidentally, it is not surprising that you cannot remove all doublets with simulated methods. They make so many assumptions about how doublets form that it's a wonder that they "work" at all.)

But honestly, if you already know the offending cluster, just look for two mutually exclusive markers for the putative contributing populations and show that they are co-expressed in the doublets. If your neurons are expressing the T cell receptor, I think that's a pretty strong case for being a doublet.

akramdi commented 4 years ago

I think you would be better off using dedicated doublet detection approaches.

I think so too. Also, looking at co-expression patterns sounds very reasonable in my case indeed, I'll explore this.

btw, thanks a lot for link to OSCA book, what a gold mine!

Best,