Running on 10X multiome?

cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data

Other

151 stars 24 forks source link

Running on 10X multiome? #6

Closed ktpolanski closed 1 year ago

ktpolanski commented 1 year ago

Hello,

We've got a bunch of data that we'd like to run SComatic on. Some of the samples are 10X multiome, i.e. both GEX and ATAC for the same samples. The data was processed via cellranger-arc, the standard practice for 10X multiome, yielding separate BAMs for GEX and ATAC.

Do you have any suggestions for how to proceed? The BAM files for the two are heterogeneous. The GEX has CB, NH and nM tags, while the ATAC has CB and an NM (not sure if this is the same as nM from the GEX). Would it make sense to merge the two BAMs together given this disparity?

Francesc-Muyas commented 1 year ago

Dear user, I strongly suggest not merging both bam files, as one is RNA-based approach, and the other is DNA-based. There are differences in the way of processing the bam files, as well as biases that depend on the RNA/DNA approach.

Regarding how to process the bam files, I would follow the toy example as a template for the scRNA-seq data.

For ATAC, I would follow a similar approach but with minor changes:

Set the __--minMQ 30_ parameter in the SplitBamCellTypes.py script
Do not use the RNA-editing sites in the Step 4.2
Use the scATAC-seq PoN in the Step 4.2

I hope it helps, Fran

ktpolanski commented 1 year ago

Thank you for your response, I've been mucking around with the GEX portion to get a feel for the processing. Step two's parallelisation is very nice, is there some way to apply similar principles to step 1 and 4.1?

Given the absence of nM and NH tags from the ATAC, I presume I just ignore those in step 1. Or would the ATAC's NM work as an adequate replacement for nM?

Francesc-Muyas commented 1 year ago

Dear user, Thanks for your suggestion. We have been thinking about the implementation of extra parallelisation in other steps of the tool, but this is something that will be addressed in the future.

Regarding your second question: in the current version of SComatic, if you do not have the exact nM or NH tags in the bam files, you should not use these filters. However, we will work to implement the possibility of using NM or nM depending on the input bam file. We will put this suggestion in our TODO list.

Thanks for your feedback, Fran

ktpolanski commented 1 year ago

I've had a chat with @apredeus, who has more genomic experience than me. He opined that the MQ filter should be sufficient for the ATAC, and I proceeded as such. The coworker that got the ATAC results was happy with them.

Here's a master list of the tweaks that were made relative to the GEX demo, following Fran's suggestions a few comments back:

Step 1: --max_nM 5 and --max_NH 1 removed, --min_MQ 30 added
Step 2: --min_mq 30 added
Step 4.2: --editing removed, and --pon altered to point at the ATAC one

Given the fact we're interested in finding mutations shared across some cell populations, I also drastically increased --max_cell_types in both GEX and ATAC processing, but that's specific to our question.

Thanks a lot for this tool! Seeing how we've got multiple 10X samples for each donor, I was able to speed up step 1 locally by running it in parallel on each sample's BAM separately and then merging per cell type.

Francesc-Muyas commented 1 year ago

Sounds perfect! Meanwhile, I will work on the implementation of the NM tag filter for scATAC-seq.

Let me know if you have further questions.

Thanks, Fran