chris-mcginnis-ucsf / DoubletFinder

R package for detecting doublets in single-cell RNA sequencing data
403 stars 107 forks source link

Predicted doublet rate #76

Closed Chenmengpin closed 4 years ago

Chenmengpin commented 4 years ago

Hi Chris,

Thank you for developing this very helpful tool. I have a question about the doublet estimation hope you give some help.

After we picking pK and defining the pANN threshold, then go further to make final doublet/singlet predictions. If I understand correctly, we need to set a potential doublet rate when estimating doublet proportion, like the testing code set it as 0.075, nExp_poi <- round(0.075*length(seu_kidney@cell.names))

But in our own data, we don't really know the doublet rate for each sample in advance. And I try to set 0.075 and 0.06 respectively using one sample as testing, the predicting doublets distribution look very similar, just the number are different. I found 412 doublets when setting it as 0.075, and 386 doublets when setting as 0.06. BM1_0.075.pdf BM1_0.06.pdf

In this case, do you have any suggestion about how to pick a reasonable potential doublet rate.

Thank you!

Mengping

chris-mcginnis-ucsf commented 4 years ago

Hi Mengping ( @Chenmengpin ),

Ideally, you can estimate the proportion of doublets in your data according to the number of cells you loaded into the droplet microfluidics device during the experiment. If you don't have this information, you can get an estimate simply by cross-referencing the number of total cells in your dataset by the doublet formation rate provided in the 10x or Drop-Seq user guides. It is worth noting that this estimate will be off if (i) your data is of low-quality (i.e., lots of cell death) or (ii) your cells were particularly prone to physical clumping.

If you do not feel comfortable using these estimates, you may consider trying Scrublet (https://github.com/AllonKleinLab/scrublet), which has a built-in feature for predicting the number of doublets in your scRNA-seq data. I tried implementing a similar feature while developing DoubletFinder but it didn't work well for a variety of reasons.

Chris

yichangyu commented 4 years ago

Hi Mengping ( @Chenmengpin ),

Ideally, you can estimate the proportion of doublets in your data according to the number of cells you loaded into the droplet microfluidics device during the experiment. If you don't have this information, you can get an estimate simply by cross-referencing the number of total cells in your dataset by the doublet formation rate provided in the 10x or Drop-Seq user guides. It is worth noting that this estimate will be off if (i) your data is of low-quality (i.e., lots of cell death) or (ii) your cells were particularly prone to physical clumping.

If you do not feel comfortable using these estimates, you may consider trying Scrublet (https://github.com/AllonKleinLab/scrublet), which has a built-in feature for predicting the number of doublets in your scRNA-seq data. I tried implementing a similar feature while developing DoubletFinder but it didn't work well for a variety of reasons.

Chris

Hi Chris,

Maybe a stupid question. But how can I estimate the proportion of doublets if I already know the number of loaded cells? For example, if I loaded 5000 cells, but the CellRanger only detected 4000 cells (from the raw matrix of CellRange count output), so the doublets rate would be (5000-4000)/5000=0.2?

Thanks! Changyu

chris-mcginnis-ucsf commented 4 years ago

Hi Chanyu,

Here's the doublet rate estimation table from the 10x V3 user guide (numbers hold for V2, as well):

image

So if you loaded 5K cells, the doublet rate should be ~2.5%. If you don't know how many cells were loaded, but you yielded 4K total cells, you could also use the table to estimate that the doublet rate was 3.1%.

Chris

yichangyu commented 4 years ago

Hi Chanyu,

Here's the doublet rate estimation table from the 10x V3 user guide (numbers hold for V2, as well):

image

So if you loaded 5K cells, the doublet rate should be ~2.5%. If you don't know how many cells were loaded, but you yielded 4K total cells, you could also use the table to estimate that the doublet rate was 3.1%.

Chris

Hi Chris,

Thanks! This is very helpful.

Changyu

Chenmengpin commented 4 years ago

Hi Mengping ( @Chenmengpin ),

Ideally, you can estimate the proportion of doublets in your data according to the number of cells you loaded into the droplet microfluidics device during the experiment. If you don't have this information, you can get an estimate simply by cross-referencing the number of total cells in your dataset by the doublet formation rate provided in the 10x or Drop-Seq user guides. It is worth noting that this estimate will be off if (i) your data is of low-quality (i.e., lots of cell death) or (ii) your cells were particularly prone to physical clumping.

If you do not feel comfortable using these estimates, you may consider trying Scrublet (https://github.com/AllonKleinLab/scrublet), which has a built-in feature for predicting the number of doublets in your scRNA-seq data. I tried implementing a similar feature while developing DoubletFinder but it didn't work well for a variety of reasons.

Chris

Hi, Chris,

Thanks a lot for your suggestion, really helpful.

Mengping

rxyMDA commented 1 week ago

Thank you very much for your information! I just have a quick question. If I have thousands of cells per sample, should I estimate the doublet rate separately for each sample? Or, should I add up the total number of cells for all samples (i.e. 20) and estimate the doublet rate, and then apply the same rate to each sample? Thanks!