biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
563 stars 105 forks source link

sambamba (v. 0.8.0) sort: exit code 271 #517

Open jeffreycyyu opened 4 months ago

jeffreycyyu commented 4 months ago

Hello,

I'm having an issue with the sambamba sort function in a step of the whole-genome DNA sequencing pipeline I am running that uses sambamba v.0.8.0.

The pipeline is the GenPipes pipeline (https://doi.org/10.1093/gigascience/giz037) and the step where the error is occuring is in step 3 of the pipeline (https://genpipes.readthedocs.io/en/latest/user_guide/pipelines/gp_dnaseq.html#bwa-sambamba-sort-sam) in the sambamba sort step.

The exit code given by the error is "Exit Code 271" which, based on GenPipes documentation, could be referencing an out-of-memory kill of the process.

We have tried increasing the memory up to 240G, but the error remains.

We are running in batched of 10 and 50 to test, and the "exit code 271" error occurs in approximately 20-40% of the alignment .bam files.

Another thing to note is that the error/crash appears to be stochastic. When we rerun the same samples with errors, the issue will resolve itself for certain samples that were previously crashing.

The sequencing files we have are whole genome sequencing of humans sequenced using MGI/BGI Genomics' DNBSEQ-T7 sequencing set (https://en.mgi-tech.com/products/instruments_info/5/).

We are running the pipeline on McGill University's Genome Center cluster "ABACUS".

No modifications of steps prior to the sambamba sort step were made to the pipeline.

This issue does not appear to be mentioned elsewhere.

We would appreciate any info or recomendations you could provide us regarding this error. Was it perhaps a special error code used during development?

======

Here is an example .sh script for the command called during this substep of the pipeline:

COMMAND:

module purge && \ module load mugqic/sambamba/0.8.0 && \ mkdir -p alignment/K0026_2-3186833/readset018 && \ touch alignment/K0026_2-3186833/readset018 && \ rm -r -f alignment/K0026_2-3186833/readset018/readset018.sorted.bam.bai && \ sambamba sort -m 240G \ alignment/K0026_2-3186833/readset018/readset018.bam \ --tmpdir ${TMPDIR:=/tmp} \ --out alignment/K0026_2-3186833/readset018/readset018.sorted.bam && \ chmod 664 alignment/K0026_2-3186833/readset018/readset018.sorted.bam.bai

The respective output for that command looks like this:

OUTPUT:

`

Begin PBS Prologue Thu May 9 11:47:35 EDT 2024 1715269655 Job ID: 15500707.scheduler.ferrier.genome.mcgill.ca Username: jyu Group: sladek Nodes: f3u31c01 End PBS Prologue Thu May 9 11:47:36 EDT 2024 1715269656

sambamba 0.8.0 by Artem Tarasov and Pjotr Prins (C) 2012-2020 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

/lb/scratch/jyu/kuwait_whole_genome/batch_1_no_realign_no_spark_samtools_sort/job_output/sambamba_sort/sambamba_sort.readset018_2024-05-09T11.46.23.sh: line 10: 749993 Killed sambamba sort -m 240G alignment/K0026_2-31868 33/readset018/readset018.bam --tmpdir ${TMPDIR:=/tmp} --out alignment/K0026_2-3186833/readset018/readset018.sorted.bam MUGQICexitStatus:137

Begin PBS Epilogue Thu May 9 11:58:16 EDT 2024 1715270296 Job ID: 15500707.scheduler.ferrier.genome.mcgill.ca Username: jyu Group: sladek Job Name: sambamba_sort.readset018 Session: 749837 Limits: mem=240gb,neednodes=1:ppn=16,nodes=1:ppn=16,walltime=12:00:00 Resources: cput=01:34:10,energy_used=0,mem=190528332kb,vmem=246532692kb,walltime=00:10:36 Queue: sw Account: Nodes: f3u31c01 Killing leftovers... epilogue debug by IT: jyu

End PBS Epilogue Thu May 9 11:58:17 EDT 2024 1715270297

`

An example command and output of what a successful run of this step in the pipeline is here:

COMMAND:

module purge && \ module load mugqic/sambamba/0.8.0 && \ mkdir -p alignment/K0037_2-3186839/readset019 && \ touch alignment/K0037_2-3186839/readset019 && \ rm -r -f alignment/K0037_2-3186839/readset019/readset019.sorted.bam.bai && \ sambamba sort -m 240G \ alignment/K0037_2-3186839/readset019/readset019.bam \ --tmpdir ${TMPDIR:=/tmp} \ --out alignment/K0037_2-3186839/readset019/readset019.sorted.bam && \ chmod 664 alignment/K0037_2-3186839/readset019/readset019.sorted.bam.bai

OUTPUT:

`

Begin PBS Prologue Thu May 9 11:47:35 EDT 2024 1715269655 Job ID: 15500708.scheduler.ferrier.genome.mcgill.ca Username: jyu Group: sladek Nodes: f3u01c07 End PBS Prologue Thu May 9 11:48:09 EDT 2024 1715269689

sambamba 0.8.0 by Artem Tarasov and Pjotr Prins (C) 2012-2020 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

MUGQICexitStatus:0

Begin PBS Epilogue Thu May 9 12:59:07 EDT 2024 1715273947 Job ID: 15500708.scheduler.ferrier.genome.mcgill.ca Username: jyu Group: sladek Job Name: sambamba_sort.readset019 Session: 3165220 Limits: mem=240gb,neednodes=1:ppn=16,nodes=1:ppn=16,walltime=12:00:00 Resources: cput=15:39:27,energy_used=0,mem=233038028kb,vmem=257419324kb,walltime=01:10:48 Queue: sw Account: Nodes: f3u01c07 Killing leftovers... epilogue debug by IT: jyu

End PBS Epilogue Thu May 9 12:59:41 EDT 2024 1715273981

`

Regards, Jeff