biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
557 stars 104 forks source link

Memory issue with Sambamba markdup #439

Closed psur9757 closed 3 years ago

psur9757 commented 4 years ago

I am trying to prepare my data for freebayes and markdup was the final step before merging. However, that failed with a segmentation fault.

Error

...
[bam_sort_core] merging from 248 files and 8 in-memory blocks...
finding positions of the duplicate reads in the file...
  sorted 244119030 end pairs
     and 947324 single ends (among them 0 unmatched pairs)
  collecting indices of duplicate reads...   done in 37330 ms
  found 94344096 duplicates
collected list of positions in 9 min 16 sec
marking duplicates...
/var/spool/PBS/mom_priv/jobs/4252591.pbsserver.SC: line 27: 190448 Segmentation fault      (core dumped) sambamba markdup -t 8 ch3hs1/aln.sorted.bam ch3hs1/aln.md.bam

I discussed this with the HPC team, here is there reply:

The markdup docs state you need loads of memory. Even using the latest version (0.7.1) and running on the highmem nodes with between 2Tb to 4Tb of mem I get variously
- a segmenation fault
- This message: sambamba-markdup: Memory allocation failed
- This message: sambamba-markdup: Read reference ID is out of range (4Tb) - but someones the Memory allocation failed as well.

The software clearly has a "bug" whereby it can't cope with failed memory allocations - perhaps thread related so it doesn't "print" or exit with the message before the seg fault.
You would have to ask the authors.

PBS Script

#! /bin/bash
#PBS -P BLRseq
#PBS -N ch3hs1
#PBS -l select=1:ncpus=8:mem=120G
#PBS -l walltime=165:00:00
#PBS -q defaultQ

#Software
module load bwa/0.7.17
module load samtools/1.10
module load bamaddrg/x
module load sambamba/0.6.4

#1.alignment
bwa mem -t 8 ref/hv.fa data/ph5_L1_1.fq.gz data/ph5_L1_2.fq.gz > ch3hs1/aln.sam
samtools sort -o ch3hs1/aln.bam -O bam -@ 8 ch3hs1/aln.sam

#2.add readgroup
bamaddrg -b ch3hs1/aln.bam -r ch3hs1 > ch3hs1/aln.rg.bam

#3.mark duplicates
sambamba sort -o ch3hs1/aln.sorted.bam -t 8 ch3hs1/aln.rg.bam
sambamba markdup -t 8 ch3hs1/aln.sorted.bam ch3hs1/aln.md.bam