Closed DanieleMuraro closed 4 years ago
Dear @DanieleMuraro,
Thank you for using CellBender!
It looks like an input data inconsistency. Can you assert the consistency between the shape of the input count matrix and the barcodes and genes tsv files? also, can you assert that your count matrix does not include any bad values? (e.g. all non-negative integers)?
Dear @mbabadi,
Thank you very much for your kind reply. The count matrix does not include negative or NA values. As regards the dimensions, the seem to be consistent:
(base) MacBook-Pro-4:filtered_feature_bc_matrix daniele$ wc -l barcodes.tsv
1784 barcodes.tsv
(base) MacBook-Pro-4:filtered_feature_bc_matrix daniele$ wc -l genes.tsv
33544 genes.tsv
(base) MacBook-Pro-4:filtered_feature_bc_matrix daniele$ head -n 5 matrix.mtx
%%MatrixMarket matrix coordinate integer general
%metadata_json: {"format_version": 2, "software_version": "3.1.0"}
33544 1784 1987712
33543 1 10
33509 1 13
Does the package work with CRISPR guides? Thanks again for your help,
Daniele
Hi @DanieleMuraro, I think I see what the issue is. Two things:
The tool makes use of the information in the empty droplets, so you will need to use the raw_feature_bc_matrix
folder as the input rather than the filtered_feature_bc_matrix
. But, even better, try using the raw_feature_bc_matrix.h5
file.
The documentation that you referenced did not keep up with the changes to the output filenames for CellRanger v3... while that link says it expects genes.tsv
, that is only for CellRanger v2 outputs. If you have CellRanger v3 outputs, then the tool will accept the features.tsv.gz
file without needing to rename it.
But, if you have access to the raw_feature_bc_matrix.h5
file, it might be easier to use that as the input. Let me know if that works!
(What I think is causing the error: since the tool finds the file called genes.tsv
, it assumes it is dealing with CellRanger v2 outputs. But since the input is really CellRanger v3, it ends up looking for data in the wrong place.)
As for the CRISPR guides, that's a great question! In the current version 0.1 of CellBender remove-background, the tool only looks at the features that are denoted as "Gene Expression". However, the mathematical model is equally good for other types of data, including "Antibody Capture". Until you mentioned it, I did not realize that 10x had made "CRISPR Guide Capture" an option, but that is a really cool idea.
So I'll venture a few guesses: if the CRISPR guides are subject to the same types of noise (ambient and swapping) that we mention in our paper (https://www.biorxiv.org/content/10.1101/791699v1), then I would expect the model / tool to perform well on the CRISPR guide counts. Can the CRISPR guides become cell-free ambient? Is that a large source of background counts? I haven't had the chance to explore a dataset with CRISPR guides yet.
In the branch called sf_removebkg_v2.1
, which is a semi-stable development version that we are working on, all of the "features" are kept, not just Gene Expression
. So if you try out that branch of the code, it will run on your CRISPR guide data as well. We expect to develop this branch into the next official release, accompanied by a publication.
If you do try to run remove-background on your CRISPR guide data, I would love to see how it looks!
Hi @sjfleming,
Thank you very much for taking the time of getting back to me. I managed to run cellbender using raw_feature_bc_matrix.h5; thank you so much! :-) I share the output plots obtained when running cellbender on the same data mentioned in my previous posts using the cellbender master version (looks at the features that are denoted as "Gene Expression" only) and when applying the branch called sf_removebkg_v2.1 (where all of the "features" are kept, not just Gene Expression). The UMI curve shows a cell probability trend similar to a step function using the master version; whereas it shows few peaks in cell probability in the area where most barcodes are associated with background using the sf_removebkg_v2.1 version. I am not sure why this happens. Thanks again for your help! erica_ipcs_cellbender_sf_removebkg_v2.1.pdf erica_ipcs_cellbender_master.pdf
Hi @DanieleMuraro,
Glad you got the code to run! And good to hear that all the features are included when using the v2.1 branch of the code.
You are right... there are a few droplets way out there (which are obviously empty) where for some reason v2.1 seems to think they have some probability of having a cell. Maybe 5 of those droplets look like they would pass the > 0.5
cell probability threshold that is used to generate the "_filtered.h5" output file.
To whom it may concern,
I run cellbender on a scRNA-Seq+CRISPR dataset derived from iPSCs. I used the cellranger output in the folder filtered_feature_bc_matrix as an input for cellbender. The cellranger output includes both Gene Expression and CRISPR Guide Capture; so, the features.tsv file looks as follows:
I renamed features.tsv as gene.tsv, to maintain the format reported in the documentation:
cellbender doc
I then run the command:
This led to the output:
Could you please help me understand what is the problem?
Thank you for your attention.
With best wishes,
Daniele Muraro