Closed acv21 closed 1 year ago
Hi @acv21,
Answers/Comments below:
And as an aside -- the elevated doublet calls for HTODemux has been observed previously (my own personal experience, and also: https://academic.oup.com/bioinformatics/article-abstract/38/10/2791/6565315), so I'd go with the MULTIseqDemux results (although I'm biased, obviously). If you use the deMULTIplex tool, you can also do semi-supervised negative cell reclassification to further boost your singlet calls (Seurat doesn't incorporate this feature).
And as for excluding intra-sample doublets, I have two suggestions.
First, you could just identify clusters in GEX space that are enriched for hashtag-defined doublets and exclude those clusters entirely from your downstream analyses. These clusters correspond to heterotypic doublets (doublets formed from transcriptionally distinct cell types) and can screw up your downstream interpretations BUT are easy to pick out using the hashtag calls (e.g., if a cluster is 80% hashtag doublets, it's pretty safe to assume the other cells are your intra-sample heterotypic doublets). What you'll be left with after that are intra-sample homotypic doublets (doublets formed from transcriptionally-similar cells) which don't really mess with interpretation too much because they 'look' like singlets.
Second, you could use doublet prediction algorithms -- there are a ton now, but you could check out my package DoubletFinder -- which will identify heterotypic doublets regardless of their sample of origin. You'll still be left with the intra-sample homotypic doublets (assuming DoubletFinder works well -- hard to know without knowing the details of your dataset), so I would just suggest the 1st option, but you could do both to make sure you're dealing with high-quality data.
Chris
Hello @chris-mcginnis-ucsf, thank you very much for your exhaustive explanation, it is much appreciated. I will go ahead with deMULTIplex only, and maybe try DoubletFinder down the line, once I have finished my preliminary analysis.
Hello,
I am analysing a 10x scRNA-seq dataset where 6 cell populations were multiplexed by antibody-based cell hashing (with Totalseq-B antibodies). I have a few questions about demultiplexing:
Is it possible to use the deMULTIplex package to demultiplex datasets with antibody-based cell hashing as opposed to lipid-based as described in the paper? I tried running it on the normalised Seurat matrices, comparing it with HTODemux (described here), and found that MULTIseqDemux returns a % of doublets much closer to what I would expect, whilst HTODemux classifies a lot more cells as doublets (almost 3x as much as expected). However I am not sure how to judge which algorithm gives the best results without relying on an expected multiplet rate (from the 10x table). Here's the code I used:
Does the algorithm identify doublets AND multiplets? Related to this, does it identifies dublets/multiplets based solely on detection of 2 or more HTOs for a single cell barcode (inter-sample doublets)? I would like to also exclude intra-sample doublets (i.e. doublets that share a cell barcode and also one HTO only); could this be achieved by manually filtering cell barcodes having an abnormally high gene count or UMI count (similarly to what is described here), prior to running MULTIseqDemux? Are there algorithms incorporating this step to avoid relying on an arbitrary threshold?
Thank you for your help!