dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
430 stars 136 forks source link

Creating a mappability map #336

Open Immortal2333 opened 1 year ago

Immortal2333 commented 1 year ago

335

I found the code you provided:

dicey chop sacCer3.fa bwa index sacCer3.fa bwa mem sacCer3.fa read1.fq.gz read2.fq.gz | samtools sort -@ 8 -o srt.bam samtools index srt.bam dicey mappability2 srt.bam gunzip map.fa.gz && bgzip map.fa && samtools faidx map.fa.gz

What are read1.fq.gz and read2.fq.gz?

delly cnv -o c1.bcf -g hg19.fa -m hg19.map -l delly.sv.bcf input.bam -- Germline CNV calling

What is the difference between input.bam (made by read1.fq.gz and read2.fq.gz) and srt.bam?

Can you please explain in more detail? Thank you!

tobiasrausch commented 1 year ago

read1.fq.gz and read2.fq.gz are the output files of dicey chop. The srt.bam file you only need for creating the mappability map: map.fa.gz.

afazhra commented 5 days ago

Hi @tobiasrausch,

I wanted to double-check something to ensure everything is working as expected. I'm generating a mappability map for the hg38 reference genome (compressed size ~3.1 GB) using dicey chop. After running for 2 hours, the output files, read1.fq.gz and read2.fq.gz, have each reached 10 GB and are still growing.

Is this expected behavior for this process?

For reference, my system has 62 GB of memory, 32 cores, and runs on Ubuntu 20.04.6 LTS.

Thank you very much for your help!

tobiasrausch commented 5 days ago

For the commonly used reference genome, we do have pre-computed mappability maps here

afazhra commented 1 day ago

Hi @tobiasrausch , thank you so much!

Would it be possible for you to provide a sample cnv.bcf file so I can review what the output looks like? Also, could you guide me on how to generate the out.cov.gz file?

delly cnv -g example/ref.fa -m example/map.fa.gz -c out.cov.gz -o cnv.bcf example/sr.bam

Thank you.