Closed ktpolanski closed 1 year ago
Dear user, I strongly suggest not merging both bam files, as one is RNA-based approach, and the other is DNA-based. There are differences in the way of processing the bam files, as well as biases that depend on the RNA/DNA approach.
Regarding how to process the bam files, I would follow the toy example as a template for the scRNA-seq data.
For ATAC, I would follow a similar approach but with minor changes:
I hope it helps, Fran
Thank you for your response, I've been mucking around with the GEX portion to get a feel for the processing. Step two's parallelisation is very nice, is there some way to apply similar principles to step 1 and 4.1?
Given the absence of nM
and NH
tags from the ATAC, I presume I just ignore those in step 1. Or would the ATAC's NM
work as an adequate replacement for nM
?
Dear user, Thanks for your suggestion. We have been thinking about the implementation of extra parallelisation in other steps of the tool, but this is something that will be addressed in the future.
Regarding your second question: in the current version of SComatic, if you do not have the exact nM or NH tags in the bam files, you should not use these filters. However, we will work to implement the possibility of using NM or nM depending on the input bam file. We will put this suggestion in our TODO list.
Thanks for your feedback, Fran
I've had a chat with @apredeus, who has more genomic experience than me. He opined that the MQ filter should be sufficient for the ATAC, and I proceeded as such. The coworker that got the ATAC results was happy with them.
Here's a master list of the tweaks that were made relative to the GEX demo, following Fran's suggestions a few comments back:
--max_nM 5
and --max_NH 1
removed, --min_MQ 30
added--min_mq 30
added--editing
removed, and --pon
altered to point at the ATAC oneGiven the fact we're interested in finding mutations shared across some cell populations, I also drastically increased --max_cell_types
in both GEX and ATAC processing, but that's specific to our question.
Thanks a lot for this tool! Seeing how we've got multiple 10X samples for each donor, I was able to speed up step 1 locally by running it in parallel on each sample's BAM separately and then merging per cell type.
Sounds perfect! Meanwhile, I will work on the implementation of the NM tag filter for scATAC-seq.
Let me know if you have further questions.
Thanks, Fran
Hello,
We've got a bunch of data that we'd like to run SComatic on. Some of the samples are 10X multiome, i.e. both GEX and ATAC for the same samples. The data was processed via cellranger-arc, the standard practice for 10X multiome, yielding separate BAMs for GEX and ATAC.
Do you have any suggestions for how to proceed? The BAM files for the two are heterogeneous. The GEX has
CB
,NH
andnM
tags, while the ATAC hasCB
and anNM
(not sure if this is the same asnM
from the GEX). Would it make sense to merge the two BAMs together given this disparity?