chris-mcginnis-ucsf / MULTI-seq

R implementation of MULTI-seq sample classification workflow
59 stars 10 forks source link

deMULTIplex for antibody-based cell hashing #26

Closed acv21 closed 1 year ago

acv21 commented 2 years ago

Hello,

I am analysing a 10x scRNA-seq dataset where 6 cell populations were multiplexed by antibody-based cell hashing (with Totalseq-B antibodies). I have a few questions about demultiplexing:

  1. Is it possible to use the deMULTIplex package to demultiplex datasets with antibody-based cell hashing as opposed to lipid-based as described in the paper? I tried running it on the normalised Seurat matrices, comparing it with HTODemux (described here), and found that MULTIseqDemux returns a % of doublets much closer to what I would expect, whilst HTODemux classifies a lot more cells as doublets (almost 3x as much as expected). However I am not sure how to judge which algorithm gives the best results without relying on an expected multiplet rate (from the 10x table). Here's the code I used:

    seurat.object.norm <- NormalizeData(seurat.object)
    seurat.object.norm <- NormalizeData(seurat.object.norm, assay="Protein", normalization.method="CLR")
    seurat.object.norm <- HTODemux(seurat.object.norm, assay="Protein", positive.quantile=0.99)
    seurat.object.norm <- MULTIseqDemux(seurat.object.norm, assay="Protein", autoThresh=TRUE)
  2. Does the algorithm identify doublets AND multiplets? Related to this, does it identifies dublets/multiplets based solely on detection of 2 or more HTOs for a single cell barcode (inter-sample doublets)? I would like to also exclude intra-sample doublets (i.e. doublets that share a cell barcode and also one HTO only); could this be achieved by manually filtering cell barcodes having an abnormally high gene count or UMI count (similarly to what is described here), prior to running MULTIseqDemux? Are there algorithms incorporating this step to avoid relying on an arbitrary threshold?

Thank you for your help!

chris-mcginnis-ucsf commented 2 years ago

Hi @acv21,

Answers/Comments below:

  1. Yup, you can use this on antibody hashing data. You'll just want to extract the hashtag count matrix from you Seurat object and use it as your 'barTable' object as described in the tutorial.

And as an aside -- the elevated doublet calls for HTODemux has been observed previously (my own personal experience, and also: https://academic.oup.com/bioinformatics/article-abstract/38/10/2791/6565315), so I'd go with the MULTIseqDemux results (although I'm biased, obviously). If you use the deMULTIplex tool, you can also do semi-supervised negative cell reclassification to further boost your singlet calls (Seurat doesn't incorporate this feature).

  1. By doublets vs multiplets, do you mean droplets with 2 cells (doublet) vs >2 cells (multiplet)? If so, then no, but the deMULTIplex 'doublet' calls would include both varieties (although my sense is that the proportion of >2-cell multiplets in any dataset is very low, unless your sample was super sticky).

And as for excluding intra-sample doublets, I have two suggestions.

First, you could just identify clusters in GEX space that are enriched for hashtag-defined doublets and exclude those clusters entirely from your downstream analyses. These clusters correspond to heterotypic doublets (doublets formed from transcriptionally distinct cell types) and can screw up your downstream interpretations BUT are easy to pick out using the hashtag calls (e.g., if a cluster is 80% hashtag doublets, it's pretty safe to assume the other cells are your intra-sample heterotypic doublets). What you'll be left with after that are intra-sample homotypic doublets (doublets formed from transcriptionally-similar cells) which don't really mess with interpretation too much because they 'look' like singlets.

Second, you could use doublet prediction algorithms -- there are a ton now, but you could check out my package DoubletFinder -- which will identify heterotypic doublets regardless of their sample of origin. You'll still be left with the intra-sample homotypic doublets (assuming DoubletFinder works well -- hard to know without knowing the details of your dataset), so I would just suggest the 1st option, but you could do both to make sure you're dealing with high-quality data.

Chris

acv21 commented 2 years ago

Hello @chris-mcginnis-ucsf, thank you very much for your exhaustive explanation, it is much appreciated. I will go ahead with deMULTIplex only, and maybe try DoubletFinder down the line, once I have finished my preliminary analysis.