Running scRNAseq pipeline with fastq files

direnardak commented 2 months ago

Hi CTAT-Mutations Team, Thanks for the awesome package! I've had great success using CTAT-Mutations with my bulk RNAseq data, and now I'm looking to give it a try with my single-cell RNAseq data. I'm running into a bit of a problem with the fastq file format. The tutorial mentions that the read names need to be in this format: cellbarcode^UMI^read_name. There's a script provided for converting from ubam files, but like many others, I get my fastq files straight from the sequencing core. I tried editing the files myself, but I couldn't get the format right. Do you have a script for fastq files? Or can you provide more details on how read1 and read2 should look? Thanks a lot for your help! Best, Arda

brianjohnhaas commented 2 months ago

Hi Arda,

The read name formatting cellbarcode^UMI^read_name is needed to get the single cell reporting. If you're starting from fastqs, hopefully there's info somewhere from the data source that would indicate where the metadata is that provides the UMI and cell barcode info. Once you have that, you could use something like pysam or a simple python script to format the fastqs. I don't have a general script for this other than what's typical of 10x ubams, as provided from the documentation.

On Thu, Jul 11, 2024 at 2:54 PM Diren Arda Karaoglu < @.***> wrote:

Hi CTAT-Mutations Team, Thanks for the awesome package! I've had great success using CTAT-Mutations with my bulk RNAseq data, and now I'm looking to give it a try with my single-cell RNAseq data. I'm running into a bit of a problem with the fastq file format. The tutorial mentions that the read names need to be in this format: cellbarcode^UMI^read_name. There's a script provided for converting from ubam files, but like many others, I get my fastq files straight from the sequencing core. I tried editing the files myself, but I couldn't get the format right. Do you have a script for fastq files? Or can you provide more details on how read1 and read2 should look? Thanks a lot for your help! Best, Arda

— Reply to this email directly, view it on GitHub https://github.com/TrinityCTAT/ctat-mutations/issues/136, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX64CVBFGQW3LVVOYTTZL3IIBAVCNFSM6AAAAABKXRSDW6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYDGOBYGE3TKMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

direnardak commented 1 month ago

I have a question regarding the BAM files you refer to as “typical 10x uBAM.” The output from Cell Ranger does not include uBAM files. Should I use another aligner to generate uBAM files, or can I use Picard to convert BAM files to uBAM format?

Sorry for the confusion...

brianjohnhaas commented 1 month ago

oh - in that case, definitely revert the bam to a ubam, and yes, Picard is good for this.

On Fri, Aug 2, 2024 at 6:49 PM Diren Arda Karaoglu @.***> wrote:

I have a question regarding the BAM files you refer to as “typical 10x uBAM.” The output from Cell Ranger does not include uBAM files. Should I use another aligner to generate uBAM files, or can I use Picard to convert BAM files to uBAM format?

Sorry for the confusion...

— Reply to this email directly, view it on GitHub https://github.com/TrinityCTAT/ctat-mutations/issues/136#issuecomment-2266231417, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX55GZ2C3PCRPA5RWKDZPQEGTAVCNFSM6AAAAABKXRSDW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWGIZTCNBRG4 . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

direnardak commented 1 month ago

So, I turned my BAM file from cellranger pipeline to uBAM file using Picard RevertSam. Then, I used provided 10x_ubam_to_fastq.py to generate fastq files. And from doing this, I am ending up with just 1 fastq file for my paired-end scRNAseq experiment. Should I just run the ctat_mutations pipeline with this?

brianjohnhaas commented 1 month ago

oh - the only 10x scRNA-seq data sets I've worked with were single-end reads, and the script was configured for that. I'm thinking it might be fine though - go ahead and give it a try. It should be dealing with cell barcodes and UMIs as encoded into the read name and so I don't think there should be trouble - but let's see how it goes, and be sure to examine the results in IGV to make sure they meet expectations.

On Mon, Aug 5, 2024 at 12:33 PM Diren Arda Karaoglu < @.***> wrote:

So, I turned my BAM file from cellranger pipeline to uBAM file using Picard RevertSam. Then, I used provided 10x_ubam_to_fastq.py to generate fastq files. And from doing this, I am ending up with just 1 fastq file for my paired-end scRNAseq experiment. Should I just run the ctat_mutations pipeline with this?

— Reply to this email directly, view it on GitHub https://github.com/TrinityCTAT/ctat-mutations/issues/136#issuecomment-2269470261, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX73JDVFMWVC4R3RZX3ZP6SMLAVCNFSM6AAAAABKXRSDW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQ3TAMRWGE . You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

direnardak commented 1 month ago

Okay, I already started running the pipeline. I'll update here after checking the results. I hope it works!

direnardak commented 1 month ago

Hello @brianjohnhaas, I finished running the pipeline. I also have targeted DNA sequencing panel results for the same cohort. I can't really say CTAT mutations and targeted DNA panels really align. The pipeline wasn't able to find some mutations with VAF values higher than %80 in some patients. Can I share some of my IGV files with you if you have time to check them? I'm not sure if they look good.

brianjohnhaas commented 1 month ago

Hi Diren,

Sure, happy to have a look. You can send data files privately to bhaas at broadinstitute dot org

best,

Brian

On Wed, Aug 21, 2024 at 2:07 PM Diren Arda Karaoglu < @.***> wrote:

Hello @brianjohnhaas https://github.com/brianjohnhaas, I finished running the pipeline. I also have targeted DNA sequencing panel results for the same cohort. I can't really say CTAT mutations and targeted DNA panels really align. The pipeline wasn't able to find some mutations with VAF values higher than %80 in some patients. Can I share some of my IGV files with you if you have time to check them? I'm not sure if they look good.

— Reply to this email directly, view it on GitHub https://github.com/TrinityCTAT/ctat-mutations/issues/136#issuecomment-2302673779, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXZ34SD6ZSFNXYGAEZLZSTJN7AVCNFSM6AAAAABKXRSDW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSGY3TGNZXHE . You are receiving this because you were mentioned.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

TrinityCTAT / ctat-mutations

Running scRNAseq pipeline with fastq files #136

--

--

--

--