broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
285 stars 52 forks source link

No cells found when excluding the ADT #223

Open aodainic7 opened 1 year ago

aodainic7 commented 1 year ago

Hey everyone, I am testing your tool to check for contamination on my scRNAseq+CITEseq experiment. I have one issue and some questions:

  1. When running the pipeline on all the features and antibody, everything works. One I add the --exclude-antibody-capture, I get the error: 'No cells found! Cannot compute expected FPR.' This is the console output:
    cellbender:remove-background: Command:
    cellbender remove-background --input CellRanger/C120_batch3_5/out
    s/multi/count/raw_feature_bc_matrix.h5 --output /CellBender/mytest/batch3_5_cellbender_out_v2_wo_adt.h5 --cuda --expected-cells 25000 --total-drop
    lets-included 130000 --fpr 0.01 --exclude-antibody-capture --epochs 200
    cellbender:remove-background: 2023-06-01 15:56:47
    cellbender:remove-background: Running remove-background
    cellbender:remove-background: Loading data from file /CellRanger/C120_batch3_5/outs/multi/count/raw_feature_bc_matrix.h5
    cellbender:remove-background: CellRanger v3 format
    cellbender:remove-background: Trimming dataset for inference.
    cellbender:remove-background: Excluding 143 features that correspond to antibody capture.
    cellbender:remove-background: Including 28676 genes that have nonzero counts.
    cellbender:remove-background: Prior on counts in empty droplets is 89
    cellbender:remove-background: Prior on counts for cells is 3693
    cellbender:remove-background: Excluding barcodes with counts below 44
    cellbender:remove-background: Using 25000 probable cell barcodes, plus an additional 105000 barcodes, and 22494 empty droplets.
    cellbender:remove-background: Largest surely-empty droplet has 50 UMI counts.
    cellbender:remove-background: Running inference...
    cellbender:remove-background: Inference procedure terminated early due to a NaN value in: mu, lam
    The suggested fix is to reduce the learning rate.
    cellbender:remove-background: 2023-06-01 15:57:20
    cellbender:remove-background: Preparing to write outputs to file...
    Traceback (most recent call last):
    File "/.conda/envs/CellBender/bin/cellbender", line 33, in <module>
    sys.exit(load_entry_point('cellbender', 'console_scripts', 'cellbender')())
    File "/CellBender/cellbender/base_cli.py", line 101, in main
    cli_dict[args.tool].run(args)
    File "/CellBender/cellbender/remove_background/cli.py", line 109, in run
    main(args)
    File "/CellBender/cellbender/remove_background/cli.py", line 204, in main
    run_remove_background(args)
    File "/CellBender/cellbender/remove_background/cli.py", line 174, in run_remove_background
    save_plots=True)
    File "/CellBender/cellbender/remove_background/data/dataset.py", line 534, in save_to_output_file
    inferred_count_matrix = self.posterior.mean
    File "/CellBender/cellbender/remove_background/infer.py", line 58, in mean
    self._get_mean()
    File "/.conda/envs/CellBender/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
    File "/CellBender/cellbender/remove_background/infer.py", line 357, in _get_mean
    raise ValueError('No cells found!  Cannot compute expected FPR.')

Here is the output when I do not omit the ADT, and the pipeline works: batch3_5_cellbender_out_v2.pdf

  1. Can you please take a look at the outputs of the batch3, and tell me if the learning curves are optimal, since there are the drops in the middle. Should I increase the number of expected cells?

batch3_1_cellbender_out_v2.pdf batch3_2_cellbender_out_v2.pdf batch3_3_cellbender_out_v2.pdf batch3_4_cellbender_out_v2.pdf

  1. I have compared the results from cellranger and cellbender batch3_1, to asses how much the reads get corrected, but it seems to be one cell only for the gene which is most different between cellranger and cellbender. Is this due to the fact that there is low contamination or the learning was not optimal? See the attached pdfs for IGKV1−33. The RAW counts are from cellranger, the RNA counts are from cellbender output. bender_vs_ranger_counts_v2.pdf features_bender_vs_ranger_v2.pdf

Cheers!

sjfleming commented 1 year ago

Hi @aodainic7 , I think I have a suggestion that can help you!

So right now, cellbender is identifying what I believe to be cells AND empty droplets as "cells". I think those regions on the UMI curve with ~300 counts (like batch3_2 from droplet 30k to droplet 100k) are the empty droplets. So you have several hundred counts of ambient RNA in empty droplets, and cellbender can probably help out a lot!

But currently cellbender is not identifying the empty droplets correctly. This can probably be fixed by changing two things:

  1. make --total-droplets-included smaller. It should be pretty much the first droplet where you're 100% sure everything past that is empty.
  2. use the --low-count-threshold parameter. This will help cellbender more easily identify the empty droplets. In your case, I would set the parameter to 100, telling cellbender than any droplet with < 100 UMI counts is "past the empty droplet plateau" and should be ignored completely. Those droplets probably represent cell barcode sequencing errors, and they are not the "real" empty droplets.

So try this:

cellbender remove-background \
    --input CellRanger/C120_batch3_5/outs/multi/count/raw_feature_bc_matrix.h5 \
    --output /CellBender/mytest/batch3_5_cellbender_out_v2.h5 \
    --cuda \
    --expected-cells 20000 \
    --total-droplets-included 35000 \
    --fpr 0.01 \
    --low-count-threshold 100
aodainic7 commented 1 year ago

Hey Stephen, thanks for the input. I have increased the threshold and I got some decent correction. The results look very promising. I subsetted the T cells and compared the expression of the most changed genes, and to my surprise I found the contamination genes:

image

Same goes for ADT, the B cell markers get reduced on T cells, but not the T cell markers(which is amazing):

image

I also see a reduction in HTO, and my question is should I exclude these from the correction? What is your experience?

image

Here is the mean change in counts per cell((1-cellbender filtered divided by the cellranger)*100) per assay

image

thanks in advance, Cheers Alex

sjfleming commented 1 year ago

Hi @aodainic7 , are those HTOs that you mention "hashtag oligos" like this kind of thing?

If this is what you're talking about, I'd be interested to hear more about your thoughts on this. I have not used these myself, and unfortunately I don't have any experience. The idea is to be able to pool cells across donors by having an (antibody-labeled) oligo barcode whose barcode encodes donor identity, right? And then you load cells from multiple donors into the same "sample", right?

If the HTOs are subject to the same sort of noise mechanisms as the antibody features (and I would expect this to be the case), then maybe running CellBender on those HTO features does make sense.

What I'd do if it were me would be to compare the raw HTO counts and the CellBender HTO counts. And specifically I'd be really interested to see if the conclusions you draw about demultiplexing cells back to their specific donors end up being the same or different when CellBender is used. For example, is it easier for the demultiplexing algorithm to do its job after CellBender cleanup? Does CellBender go too far? Not make a big difference?

I would think it might be kind of like the human and mouse cell benchmark we use: you might see that donor assignment for singlet cells becomes more obvious, but you'd hope to see that true doublets remain doublets in terms of HTO counts after cellbender.

Okay actually, I had another thought that complicates this, although I'll leave what I've written above:

aodainic7 commented 1 year ago

Hello Stephan, exactly the same as is the publication, hashtag oligos for multiplexing. I wanted to investigate the questions you asked. I could not see a very strong effect on smaller cell groups rather in larger ones. The counts get "decontaminated" for one specific HTO, while the rest remain basically unchanged: image Interestingly, the changes stay more or less consistent across cell types in the same sample (which is amazing!). Here is an example of one donor: image The results look promising, do you have any other critical points I should check?

I have a suggestion, maybe someone would like to exclude the HTOs from the background removal, thus maybe introduce an option to specify when running cellbender. There is only the possibility for --exclude-antibody-capture, so maybe add --exclude-hashtag-oligos. Cheers!

sjfleming commented 1 year ago

Hi @aodainic7 , nothing else comes to mind, I don't think. I do think that excluding the HTOs might make more sense in your case. In v0.3.0 I will be changing --exclude-antibody-capture to --exclude-feature-types where the user can specify any valid feature type. (Currently it has to be one of the types allowed by 10x, which is ['Gene Expression', 'Antibody Capture', 'CRISPR Guide Capture', 'Custom', 'Peaks'].) When you create this dataset, do you run it through 10x CellRanger to get a count matrix? Does the feature_type show up as Custom?

sjfleming commented 1 year ago

That --exclude-feature-types input argument is now part of the v0.3.0 release.