epigen / atacseq_pipeline

Ultimate ATAC-seq Data Processing & Quantification Workflow. A Snakemake implementation of the BSF's ATAC-seq Data Processing Pipeline extended by downstream quantification and annotation steps using bash and Python.
https://epigen.github.io/atacseq_pipeline/
MIT License
43 stars 2 forks source link

Input files #42

Closed BiotechPedro closed 3 months ago

BiotechPedro commented 3 months ago

Hi Bock team!

I've started using your pipeline and I really like it! I'm wondering why the input files are bam instead of compressed fastq. I've been converting the files with the following code, but I'm curious to know how do you do it since it's not in the pipeline.

samtools import -1 R1.fq.gz -2 R2.fq.gz -o unaligned.bam

thank you!

pedro

sreichl commented 3 months ago

Hi Pedro, thanks for your kind words. Please consider starring the repo if it is interesting or useful to you, this helps others to find and benefit from the effort and me to prioritize my efforts!

Back to your question: This pipeline was developed based on a pipeline from our sequencing facility, which always provides raw/unaligned BAM files after demultiplexing. It is probably because it's more storage efficient and ready for downstream processing/analysis. Your command looks fine to me, thanks for sharing!

If this answers your question, feel free to close the issue.

Thanks & Cheers, Stephan

BiotechPedro commented 3 months ago

Thanks for the quick answer, Stephan. For sure I'm starring it. So far, it's been really helpful :)

Cheers,

pedro

sreichl commented 3 months ago

No worries and thanks! If you like this pipeline (module) you can check out more for downstream analysis of your processed ATAC-seq data or in general for biomedical data science.