dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
437 stars 137 forks source link

segmentation fault at split reads assembly point #297

Closed skose82 closed 2 years ago

skose82 commented 2 years ago

Hi there,

I have been getting the following error message on whole genome samples in germline SV calling mode:

Command: (using singularity image delly_v1.0.3.sif) delly call -g Homo_sapiens_assembly38.fasta -o sample1.bcf -x hg38.excl.tsv sample1.bam

delly version:

Delly version: v1.0.3 using Boost: v1.58.0 using HTSlib: v1.13

Error: [2022-Aug-19 18:51:07] Paired-end and split-read scanning

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2022-Aug-19 18:52:46] Split-read clustering

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2022-Aug-19 18:52:46] Paired-end clustering

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2022-Aug-19 18:52:46] Split-read assembly

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *[E::fai_retrieve] Failed to retrieve block: unexpected end of file

I've run delly on other samples fine. This is a new batch of samples I am running to satisfy the 20 unrelated files needed to run the germline SV filter after merging all the genotyped samples. I've checked the index of the reference fasta, this ran fine again for previous samples. I've checked memory issues, I ran this on an HPC cluster with 24 CPUS/8GB per CPU. I've made sure they were marked duplicate bams by marking duplicates with picard. I've made sure Chr designations are the same as the files that ran fine, e.g chromosome designations have the 'Chr' prefix.

Other than checking for the above, I am not sure what the issue might be.

Any assistance would be greatly appreciated.

tobiasrausch commented 2 years ago

That's an error from the HTSlib library that delly includes. Looks to me like your FASTA reference or BAM is truncated or the index files are outdated. Can you do first... samtools faidx Homo_sapiens_assembly38.fasta samtools index sample1.bam ... and then run delly again?

skose82 commented 2 years ago

Hi Tobias,

Thank you kindly for your response. When I was re-indexing the sample1.bam the following was outputted: [W::bam_hdr_read] EOF marker is absent. The input is probably truncated

So it is truncated. Is there a way to get around this at all?

tobiasrausch commented 2 years ago

The alignment didn't finish properly. You need to remap the data.