brentp / smoove

structural variant calling and genotyping with existing tools, but, smoothly.
Apache License 2.0
222 stars 21 forks source link

using multiple bams from same sample - does not finish running & trims the sequencing run number from the sample name #224

Closed heidihyang closed 10 months ago

heidihyang commented 11 months ago

Hi,

I am running an array job on my cluster to run my samples in parallel. I have multiple sequencing runs of the same sample, but only some of the samples with multiple bams will go to completion while others do not. I get a few different kinds of errors:

[E::bgzf_read_block] Invalid BGZF header at offset 39243705
[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes
samtools index: failed to create index for "/u/scratch/h/hyangg/smoove_output/Qlob.PEN.5.00F.disc.bam"
panic: exit status 1

[smoove]:2023/07/24 15:19:40 finished process: lumpy-filter (set -eu; lumpy_filter -f /u/home/h/hyangg/project-vlsork/ref/Qlob.ref.v3.2.fasta Qlob.CTW.6.00F.w1.m) in user-time:58.249006s system-time:4.801623s
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_uncompress] Inflate operation failed: 1
[E::bgzf_read] Read block operation failed with error 1 after 0 of 4 bytes
samtools index: failed to create index for "/u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.disc.bam"
panic: exit status 1

[smoove]:([E]lumpy-filter) 2023/07/24 15:19:41 [lumpy_filter] extracted splits and discordants from 60296764 total aligned reads
[smoove]:([E]lumpy-filter) 2023/07/24 15:19:41 mv: cannot stat '/u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.split.bam.tmp.bam': No such file or directory
[smoove]:2023/07/24 15:19:41 finished process: lumpy-filter (set -eu; lumpy_filter -f /u/home/h/hyangg/project-vlsork/ref/Qlob.ref.v3.2.fasta Qlob.CTW.6.00F.w6.m) in user-time:57.198962s system-time:4.708654s
[smoove]:2023/07/24 15:19:41 error running command: set -eu; lumpy_filter -f /u/home/h/hyangg/project-vlsork/ref/Qlob.ref.v3.2.fasta Qlob.CTW.6.00F.w6.markdup.bam /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.split.bam.tmp.bam /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.disc.bam.tmp.bam 2 && mv /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.split.bam.tmp.bam /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.split.bam && mv /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.disc.bam.tmp.bam /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.disc.bam && cp /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.split.bam /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.split.bam.orig.bam && cp /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.disc.bam /u/scratch/h/hyangg/smoove_output/Qlob.CTW.6.00F.disc.bam.orig.bam -> exit status 1
panic: exit status 1

These errors are recurrent for the samples that didn't run successfully in the array job. However, when I run them in isolation these samples do not throw these errors and will produce the final vcf. I was wondering if you have thoughts on what the underlying issue might be, and how I should treat multiple bams from the same sample. They have different IDs (such as Qlob.CTW.6.00F.w1 and Qlob.CTW.6.00F.w4) but have the same SM (Qlob.CTW.6.00F) in the bams – I don't know if this is an issue. Thanks!

Heidi

brentp commented 11 months ago

Hi Heidi, you would have to first merge the sample bams into a single file if they are, e.g. one sample in multiple lanes. Or you would have to change the sample names before running smoove if you want to keep them separate. -Brent