Open AndiL04 opened 1 year ago
Hi @AndiL04 , thanks for your question.
We do think that cellbender can be directly used on multiome data with both gene expression and ATAC data, just the way you've done it. You've done it exactly the way you're supposed to.
A few caveats and things to think about:
--exclude-feature-types Peaks
input argument, still giving cellbender the entire raw data file.) See some discussion here, although some of the problems mentioned have already been solved in v0.3.0. #167 #121 --projected-ambient-count-threshold
input argument. The default value is 0.1, but larger values will lead to the inclusion of fewer features in the analysis. The meaning of the number is "estimate the noise counts summed over all cells, for each feature... then exclude features (leave them alone) with estimated noise counts less than this value". The idea is that really low-count features don't contribute much noise, and so they can be left alone, i.e. output=input.Thank you so much for the insightful suggestions! Regarding the points 3&4, I was able to run on default setting without errors on a few samples. Now I am testing adding --exclude-feature-types Peaks and compare the results.
Thanks again for the detailed explainations!
You're welcome! Yeah if you use --exclude-feature-types Peaks
then cellbender will not touch the ATAC data (output = input).
Hey @AndiL04, did you get a chance to compare the results? I am quite curious about this. Thanks.
Hi @YiweiNiu, regarding to our data, the results were similar. We chose to add --exclude-feature-types Peaks argument, limiting the application of CellBender exclusively to the RNA-seq component. After QC, we observed an approximate 15% increase in the number of nuclei compared to the results obtained using the original CellRanger output.
Hope this info help.
Hi,
I really like your tool and the tool is really easy to use!
I am currently working on 10X Multiome data, and I am wondering if I need to split the h5 files generated from CellRanger-arc to two matrices as the input for CellBender? I saw it has mentioned in the publication that 10X Multiome assay can be used but I didn't find specific instructions.
I tried it on one sample and the output.log shows:
cellbender:remove-background: Command: cellbender remove-background --cuda --input raw_feature_bc_matrix.h5 --output output.h5 cellbender:remove-background: CellBender 0.3.0 cellbender:remove-background: (Workflow hash 1840fd242a) cellbender:remove-background: 2023-09-15 22:31:33 cellbender:remove-background: Running remove-background cellbender:remove-background: Loading data from raw_feature_bc_matrix.h5 cellbender:remove-background: CellRanger v3 format cellbender:remove-background: Features in dataset: 36601 Gene Expression, 75643 Peaks cellbender:remove-background: Trimming features for inference. cellbender:remove-background: 107369 features have nonzero counts. cellbender:remove-background: Prior on counts for cells is 8583 cellbender:remove-background: Prior on counts for empty droplets is 267 cellbender:remove-background: Excluding 12101 features that are estimated to have <= 0.1 background counts in cells. cellbender:remove-background: Including 95268 features in the analysis. cellbender:remove-background: Trimming barcodes for inference. cellbender:remove-background: Excluding barcodes with counts below 133 cellbender:remove-background: Using 2237 probable cell barcodes, plus an additional 8971 barcodes, and 40753 empty droplets.
With the output (output_filtered.h5), I used scCustomize to read in the data and split the matrix to GEX part and ATAC part. I am not sure if that is the correct way to do.
Really appreciate your help!
Best, Andi