broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
297 stars 54 forks source link

Compatibility with 10X Multiome data? #284

Open AndiL04 opened 1 year ago

AndiL04 commented 1 year ago

Hi,

I really like your tool and the tool is really easy to use!

I am currently working on 10X Multiome data, and I am wondering if I need to split the h5 files generated from CellRanger-arc to two matrices as the input for CellBender? I saw it has mentioned in the publication that 10X Multiome assay can be used but I didn't find specific instructions.

I tried it on one sample and the output.log shows:

cellbender:remove-background: Command: cellbender remove-background --cuda --input raw_feature_bc_matrix.h5 --output output.h5 cellbender:remove-background: CellBender 0.3.0 cellbender:remove-background: (Workflow hash 1840fd242a) cellbender:remove-background: 2023-09-15 22:31:33 cellbender:remove-background: Running remove-background cellbender:remove-background: Loading data from raw_feature_bc_matrix.h5 cellbender:remove-background: CellRanger v3 format cellbender:remove-background: Features in dataset: 36601 Gene Expression, 75643 Peaks cellbender:remove-background: Trimming features for inference. cellbender:remove-background: 107369 features have nonzero counts. cellbender:remove-background: Prior on counts for cells is 8583 cellbender:remove-background: Prior on counts for empty droplets is 267 cellbender:remove-background: Excluding 12101 features that are estimated to have <= 0.1 background counts in cells. cellbender:remove-background: Including 95268 features in the analysis. cellbender:remove-background: Trimming barcodes for inference. cellbender:remove-background: Excluding barcodes with counts below 133 cellbender:remove-background: Using 2237 probable cell barcodes, plus an additional 8971 barcodes, and 40753 empty droplets.

With the output (output_filtered.h5), I used scCustomize to read in the data and split the matrix to GEX part and ATAC part. I am not sure if that is the correct way to do.

Really appreciate your help!

Best, Andi

sjfleming commented 1 year ago

Hi @AndiL04 , thanks for your question.

We do think that cellbender can be directly used on multiome data with both gene expression and ATAC data, just the way you've done it. You've done it exactly the way you're supposed to.

A few caveats and things to think about:

AndiL04 commented 1 year ago

Thank you so much for the insightful suggestions! Regarding the points 3&4, I was able to run on default setting without errors on a few samples. Now I am testing adding --exclude-feature-types Peaks and compare the results.

Thanks again for the detailed explainations!

sjfleming commented 1 year ago

You're welcome! Yeah if you use --exclude-feature-types Peaks then cellbender will not touch the ATAC data (output = input).

YiweiNiu commented 11 months ago

Hey @AndiL04, did you get a chance to compare the results? I am quite curious about this. Thanks.

AndiL04 commented 11 months ago

Hi @YiweiNiu, regarding to our data, the results were similar. We chose to add --exclude-feature-types Peaks argument, limiting the application of CellBender exclusively to the RNA-seq component. After QC, we observed an approximate 15% increase in the number of nuclei compared to the results obtained using the original CellRanger output.

Hope this info help.