broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data
MIT License
119 stars 34 forks source link

Questions on knee plot analysis and matrix construction #467

Open bulahwu opened 2 hours ago

bulahwu commented 2 hours ago

Hi, I’m just starting out in this field. We have an scRNA-seq dataset that includes 438 million paired-end reads (2x150bp) from approximately 8000 cells sourced from tissue samples. We processed the dataset following the Drop-seq Alignment Cookbook protocol. Here’s the knee plot derived from our dataset, based on the reads-per-cell-barcode table generated by BAMTagHistogram. The plots vary only in the x-axis limits (from left to right: all, 1M, 100k, and 10k cell barcodes).

knee_plot_dropseq_original_02

I have two questions:

  1. It’s challenging to estimate the cell count from this plot. What factors might contribute to the shape of it?
  2. Can the barcodeRanks and emptyDrops functions from DropletUtils be used to identify the knee point? Since these tools require a raw matrix, do you have any suggestions for setting DigitalExpression parameters to construct such a matrix?

Thank you very much for your help!

jamesnemesh commented 1 hour ago

Hi! This is a pretty ancient way to evaluate how many cells you have in your data, and you can see it's pretty challenging.

For single cell data (but not nuclei data) I would absolutely try using DropletUtils. You want to generate a matrix that contains both your cells and some empty droplets. One way to capture both that works in most situations is to provide an argument to MIN_NUM_TRANSCRIPTS_PER_CELL=20. This will filter the output matrix to cells with at least 20 UMIs, which will help the matrix not be huge, while still capturing the "empty" droplets, which will probably have many more transcripts.