broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
290 stars 52 forks source link

High percentage of MT genes #69

Closed megan1111 closed 3 years ago

megan1111 commented 4 years ago

Hi,

I am running CellBender on UMI-tagged scRNA data. The problem I am facing is that cells are left with high percentage of MT-genes after running cellbender remove-background. Thus, after filtering the cells (post-cellbender) using the criteria: nFeature_RNA > 200 & nFeature_RNA < 8000 & percent.mt < 20, I ended up with 25 cells from my original cell count of 8034.

Here is what the Violin plot looks like image

Below are the commands I used:

cellbender remove-background \ --input "${projDir[i]}/outs/raw_feature_bc_matrix" \ --output "$plotDir/cellbender_feature_bc_matrix.h5" \ --expected-cells ${expectedCellNum[i]} \ --total-droplets-included ${totalDroplet[i]}

Expected cells: 5000 Total Droplets included: 110000 - which is the number of barcodes derived from 2nd plunge in UMI counts (see below) No. of epochs: 150

image

The log file output looks a little concerning, as it presents as 0 empty droplets in line 11 (see below) image

Not sure if it is because I fed too many background RNA into the algorithm or if there is a limit to the number of droplets that can be included. But the sample is definitely not supposed to have this high a percentage of MT genes, as proven by CellRanger analysis done preveiously.

sjfleming commented 4 years ago

This sounds like an interesting case. So do you mean that the number (and percentage) of mitochondrial reads per cell is going up after running remove-background v1?

I would definitely suggest changing to --total-droplets-included 15000. This parameter is meant to be the number of droplets that might potentially have a cell, i.e. everything after this number on the UMI curve is "surely empty". So in your case, I would say that all droplets after 15,000 are "surely empty". Those surely-empty droplets will still be used by the algorithm, but the computation will proceed much faster, and it may improve things, since that is what the algorithm expects.

If you are still seeing an increase in mitochondrial genes after this change, then this is an interesting kind of case I have not seen before, and it is for precisely this kind of thing that we are developing remove-background v2. It may be the case that the imputation that v1 does is hurting here. It is possible this results in more mitochondrial genes... not what we want to happen. There are two options then: (1) you can run v1 with --z-dim 100 --z-layers 500, which will help to reduce the smoothing of the output data, and may alleviate the problem; or (2) you could try running v2 using the code on the sf_removebkg_v2.1 branch. v2 does not impute counts, and so the output count matrix is strictly less than or equal to the input for each entry in the count matrix. Let me know if you need help running this, or if you're a Terra user I can point you to a Terra workflow for v2. v2 is still under development, but we should be releasing it pretty soon!

megan1111 commented 4 years ago

Thanks for your reply! Really excited for the release of remove-background v2 as well.

A couple of questions regarding the new version:

  1. Will you be adding additional features to the v2 code on the sf_removebkg_v2.1 branch prior to the release of v2?
  2. When would be the stipulated release date of v2 (if any)

Thanks!

sjfleming commented 4 years ago

Here's the current plan:

The sf_removebkg_v2.1 branch is going to be released just as it is (with only cosmetic changes... no substance changes at all) as soon as PR #71 gets merged. It will be released as version 0.2.0

That should happen within a week or two.

I am working on some further updates to the implementation details that will speed things up a bit and improve a little from what's currently on sf_removebkg_v2.1... those changes will be released as v0.2.1, and there will be a paper submission that accompanies that version. That timeline will be a bit longer, but I will make the changes available in a new branch once version 0.2.0 is out.