biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
555 stars 104 forks source link

Sambamba slice - 94 Segmentation fault (core dumped) #515

Open alexvasilikop opened 5 months ago

alexvasilikop commented 5 months ago

Hello I am running sambamba v. 1.0.1 from a within a docker image on the Google Cloud within nextflow. The purpose is to slice the bam file (human genome) by chromosome. Therefore I am providing a bed file with the coordinates of a single chromosome at a time (as a channel) to nextflow in a process:

''' process sambamba_slice_bam { container 'gcr.io/diagnostics-uz/sambamba_v1.0.1@sha256:f6947d458d2a225580976b1ce8e238a07098073307700fd41bb0cda910956b28' label 'lotsOfWork' machineType 'e2-highmem-16' memory '16 GB' maxForks 8 disk { 20.GB + ( 3.B * bam.size() ) }

input: tuple val(sample_id), path(bam), path(bai) path chromosome_bed val num_threads

output: tuple val(sample_id), path("results/.bam"), path("results/.bai"), emit: indexed_sliced_bam

shell: mkdir -p result

get list of chromosomes to slice

CHROMOSOMES_TO_SLICE=$(cat !{chromosome_bed} | while read chr start end; do echo "$chr";done | sort | uniq | xargs)

#perform slicing
SAMBAMBA_EXEC=/work/apps/sambamba/sambamba

for chrom in ${CHROMOSOMES_TO_SLICE}; do
  echo -e "Working on chromosome ${chrom} ...  \\n"
  single_chrom_bed="!{sample_id}.${chrom}.sliced.bed"
  echo -e "Constructing ${single_chrom_bed} to slice bam for ${chrom}... \\n"
  OUTBAM=$(basename $single_chrom_bed .bed).bam
  grep -P "^${chrom}\\s" "!{chromosome_bed}" > "${single_chrom_bed}"

  #perform slicing
  $SAMBAMBA_EXEC slice -o "results/${OUTBAM}" -L "${single_chrom_bed}" "!{bam}" 
  #index sliced BAM
  $SAMBAMBA_EXEC index --nthreads="!{num_threads}" "results/${OUTBAM}"
done
echo -e "ALL DONE\\n"

} '''

I am getting the following error: ''' sambamba 1.0.1 by Artem Tarasov and Pjotr Prins (C) 2012-2023 LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0) /mnt/disks/gcap-nf-scratch/f1/c1747bb64e922dbfeabe384eee928d/.command.sh: line 9: 94 Segmentation fault (core dumped) ${SAMBAMBA_EXEC} slice -o "results/${OUTBAM}" -L "${single_chrom_bed}" "277469.recalibrated.sorted.bam" '''

Any idea what the problem is?

AgedMordorBlue commented 3 months ago

I've had a similar issue on my institution's cluster, there it was because the D language underlying Sambamba cannot handle some modern hardware. Something with D using ubyte to estimate CPU cache size which doesn't work on either the amount of CPUs or the type of CPUs, which causes a division by 0 down the line.

What solved it for us was to request older CPUs for the job.

alexvasilikop commented 3 months ago

I ended up using sambamba view instead ..