Closed akramdi closed 4 years ago
It's possible to get the second-best label from digging around in the scores
matrix of the output. However, there is no guarantee that it will give you anything meaningful with respect to doublets. I can already think of a few counterexamples off the top of my head:
SingleR()
could be used to identify the contributors, but this is not of much interest because we just want to get rid of the doublets.Rather, I would suggest you use dedicated tools for removing doublets. A survey of some approaches is available in the OSCA book; similarly, you can also use scDblFinder for doublet simulation.
Thank you, I really appreciate your detailed response.
I actually tried a couple of tools dedicated to doublet detection (DoubletFinder, Scrublet) with which I'm able to remove some but not all doublets. This is why I thought of singleR to help me detect/remove the remaining doublets. I didn't know about scDbIFinder, I'll give it a go.
In my case, my doublet suspicion is very precise and I'm looking to confirm/refute it with singleR. I think I have doublets made up of noradrenergic cells (tumor cells, they make up the majority of the sample) and normal cells from the microenvironnement. I digged around the scores matrix to get a feeling and I'm getting interesting results:
NA
value in pruned.labels
field ), I would be tempted to consider these as potential doublets too along with the ones found in previous point. Does this way of exploring/interpreting the results make sens?
You've mentioned interesting points about the score threshold to consider and I'm also thinking that the results might be influenced by the diversity of the chosen reference (I am working with HumanPrimaryCellAtlasData()
).
Does this way of exploring/interpreting the results make sens?
Maybe. As I said before, I could see how it could work, but I could also see how it might not work, and so it's hard to say. I think you would be better off using dedicated doublet detection approaches.
If you've got good enough clusters that your doublets fall into their separate cluster, consider using scran::doubletCluster
; this will assemble evidence that a cluster does not consist of doublets of two other populations, and if you don't have strong evidence, well, it's probably doublets. This is a lot easier to interpret and has fewer assumptions than the simulation-based methods, but it assumes that you have reasonable clusters that distinguish doublets and their parents.
(Incidentally, it is not surprising that you cannot remove all doublets with simulated methods. They make so many assumptions about how doublets form that it's a wonder that they "work" at all.)
But honestly, if you already know the offending cluster, just look for two mutually exclusive markers for the putative contributing populations and show that they are co-expressed in the doublets. If your neurons are expressing the T cell receptor, I think that's a pretty strong case for being a doublet.
Hello,
I was wondering if singleR can be used to detect doublets in a single cell experiment.
SingleR returns the best label from a reference, is there a way we could get the second best label for a given cell to confirm a suspicion of a doublet (suspicion based on observed genes co-expression indicating the presence of two cell types under the same barcode). More broadly, how can we take advantage of singleR to get a feeling if a cell is potentially a doublet ?
This may not be what singleR was meant for but I'd love to hear your thoughts about this.
Thanks a lot, Amira