Open racng opened 1 year ago
To me, sample number 3 looks like the kind of output I would expect.
Sample 8 is not doing a great job calling cells, I agree with you. Can you post the first part of the cellbender log file, before the training starts, where cellbender says how many cells and empties there are, and estimates UMI counts in each? That will give me a better idea about what cellbender thinks it's seeing.
Here is the beginning of cellbender log file:
cellbender:remove-background: Command: cellbender remove-background --input /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 --output results/qc/cellbender/8/output.h5 --cuda --posterior-batch-size 256 --total-droplets-included 50000 --expected-cells 10000 cellbender:remove-background: CellBender 0.3.0 cellbender:remove-background: (Workflow hash 525d18ef91) cellbender:remove-background: 2023-08-22 23:58:26 cellbender:remove-background: Running remove-background cellbender:remove-background: Loading data from /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 cellbender:remove-background: CellRanger v3 format cellbender:remove-background: Features in dataset: 192 Antibody Capture, 36608 Gene Expression cellbender:remove-background: Trimming features for inference. cellbender:remove-background: 26853 features have nonzero counts. cellbender:remove-background: Prior on counts for cells is 10819 cellbender:remove-background: Prior on counts for empty droplets is 1600 cellbender:remove-background: Excluding 9161 features that are estimated to have <= 0.1 background counts in cells. cellbender:remove-background: Including 17692 features in the analysis. cellbender:remove-background: Trimming barcodes for inference. cellbender:remove-background: Excluding barcodes with counts below 800 cellbender:remove-background: Using 10000 probable cell barcodes, plus an additional 40000 barcodes, and 31805 empty droplets. cellbender:remove-background: Largest surely-empty droplet has 1597 UMI counts. cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /tmp/tmpn2iy66eh cellbender:remove-background: Successfully unpacked tarball to /tmp/tmpn2iy66eh
Well things seem to look pretty much how I'd expect.
It's not clear to me why we're not getting something a bit more reasonable...
I am guessing a bit here, but could you try --expected-cells 20000 --total-droplets-included 60000
?
I think those settings improved it a bit, but it is still estimating ~50k cells instead of 30k. Log: cellbender:remove-background: Command: cellbender remove-background --input /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 --output results/qc/cellbender/8/output.h5 --cuda --posterior-batch-size 256 --total-droplets-included 60000 --expected-cells 20000 cellbender:remove-background: CellBender 0.3.0 cellbender:remove-background: (Workflow hash ad6a62f361) cellbender:remove-background: 2023-08-23 21:38:43 cellbender:remove-background: Running remove-background cellbender:remove-background: Loading data from /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 cellbender:remove-background: CellRanger v3 format cellbender:remove-background: Features in dataset: 192 Antibody Capture, 36608 Gene Expression cellbender:remove-background: Trimming features for inference. cellbender:remove-background: 26853 features have nonzero counts. cellbender:remove-background: Prior on counts for cells is 7407 cellbender:remove-background: Prior on counts for empty droplets is 1489 cellbender:remove-background: Excluding 7747 features that are estimated to have <= 0.1 background counts in cells. cellbender:remove-background: Including 19106 features in the analysis. cellbender:remove-background: Trimming barcodes for inference. cellbender:remove-background: Excluding barcodes with counts below 744 cellbender:remove-background: Using 20000 probable cell barcodes, plus an additional 40000 barcodes, and 22071 empty droplets. cellbender:remove-background: Largest surely-empty droplet has 1486 UMI counts. output_report_8_new.html.zip
I have a log file from cellbender v0.2.2 that was able to estimate ~30k cells:
cellbender:remove-background: Command:
cellbender remove-background --input /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 --output results/qc/cellbender/8/exp10000_total40000_thresh200_z50.h5 --cuda --epochs 150 --fpr 0.01 --learning-rate 5e-05 --expected-cells 10000 --total-droplets-included 40000 --low-count-threshold 200 --z-dim 50
cellbender:remove-background: 2023-07-12 00:19:02
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from file /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Including 26853 genes that have nonzero counts.
cellbender:remove-background: Prior on counts in empty droplets is 1479
cellbender:remove-background: Prior on counts for cells is 10067
cellbender:remove-background: Excluding barcodes with counts below 739
cellbender:remove-background: Using 10000 probable cell barcodes, plus an additional 30000 barcodes, and 42094 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 1737 UMI counts.
cellbender:remove-background: Running inference...
It shows similar priors for cells and empty droplets and count threshold. For v0.3.0, could it be that excluding 6-7k features estimated to have <= 0.1 background counts in cells be reducing the model complexity too much? How do I adjust that with --projected-ambient-count-threshold
?
Update: using --projected-ambient-count-threshold 0
didn't help.
Hi @racng , this is an interesting example. It does seem like cell probability inference is not working as well on this sample in v0.3.0 as it was in v0.2.2.
(It is definitely the case that v0.3.0 does better than v0.2.2 on a lot of samples. But this seems to be an exception.)
You are right that --projected-ambient-count-threshold 0
is the way to include all the features expressed at a nonzero level. But that didn't seem to help...
Is there any chance I could get a copy of that h5 file to try to experiment a bit and see what is going on?
In the meantime, two other settings I'd try to just hope we can force the outcome we want...
--expected-cells 25000 --total-droplets-included 40000
--expected-cells 200 --total-droplets-included 40000
@sjfleming I have just sent you an email via your Broad Institute email
I have a particular sample that struggles to work well with cellbender. I am currently the latest cellbender v0.3.0. Its UMI curve has a weak knee structure. By eye, I am guessing there are around 30k cells but cellbender is overestimating that. I have tried running it with the default settings and also increasing the
total-droplets-included=50000
and setting alow expected-cells=10000
, but couldn't get the program to call cells at the expected levels. I have attached the html reports below for a sample that worked well (No. 3, default setting) vs. the sample having trouble (No. 8). Suggestions would be greatly appreciated! Thank you!output_report_8.html.zip output_report_3.html.zip