ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
737 stars 134 forks source link

OS return value: -9 #1356

Open Wanli-HE opened 3 weeks ago

Wanli-HE commented 3 weeks ago

Description of bug

Hi!

I am assembly soil samples, there has some samples , i try to set different params for example different memory or threads, but still not solving.

best, wanli

spades.log

Command line: /home/projects/ku_00041/apps/wanli/v1.4.1-plaspline/Plaspline/conda_envs/f86ddc1d/bin/spades.py --meta -1 /home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz -2 /home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz -t 10 -m 170 -k 21,33,55,77 -o /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res

System information: SPAdes version: 3.15.3 Python version: 3.9.0 OS: Linux-3.10.0-1062.4.1.el7.x86_64-x86_64-with-glibc2.17

Output dir: /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Metagenomic mode Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz'] right reads: ['/home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset will be auto-detected Corrected reads will be compressed Assembly parameters: k: [21, 33, 55, 77] Repeat resolution is enabled Mismatch careful mode is turned OFF MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF Other parameters: Dir for temp files: /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/tmp Threads: 10 Memory limit (in Gb): 170

======= SPAdes pipeline started. Log can be found here: /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/spades.log

/home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz: max reads length: 151 /home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz: max reads length: 151

Reads length: 151

===== Before start started.

===== Read error correction started.

===== Read error correction started.

== Running: /home/projects/ku_00041/apps/wanli/v1.4.1-plaspline/Plaspline/conda_envs/f86ddc1d/bin/spades-hammer /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/corrected/configs/config.info

0:00:00.000 1M / 18M INFO General (main.cpp : 75) Starting BayesHammer, built from N/A, git revision N/A 0:00:00.000 1M / 18M INFO General (main.cpp : 76) Loading config from /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/corrected/configs/config.info 0:00:00.001 1M / 18M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 10 0:00:00.001 1M / 18M INFO General (memory_limit.cpp : 48) Memory limit set to 170 Gb 0:00:00.002 1M / 18M INFO General (main.cpp : 86) Trying to determine PHRED offset 0:00:00.002 1M / 18M INFO General (main.cpp : 92) Determined value is 33 0:00:00.002 1M / 18M INFO General (hammer_tools.cpp : 38) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ] 0:00:00.002 1M / 18M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes === ITERATION 0 begins === 0:00:00.002 1M / 18M INFO K-mer Counting (kmer_data.cpp : 283) Estimating k-mer count 0:00:00.045 161M / 185M INFO K-mer Counting (kmer_data.cpp : 288) Processing /home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz 0:01:44.662 161M / 186M INFO K-mer Counting (kmer_data.cpp : 297) Processed 40344553 reads 0:01:44.663 161M / 186M INFO K-mer Counting (kmer_data.cpp : 288) Processing /home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz 0:03:29.889 161M / 188M INFO K-mer Counting (kmer_data.cpp : 297) Processed 80689106 reads 0:03:29.890 161M / 188M INFO K-mer Counting (kmer_data.cpp : 302) Total 80689106 reads processed 0:03:30.261 161M / 188M INFO K-mer Counting (kmer_data.cpp : 305) Estimated 13873436049 distinct kmers 0:03:30.262 1M / 188M INFO K-mer Counting (kmer_data.cpp : 309) Filtering singleton k-mers 0:03:30.265 41G / 41G INFO K-mer Counting (kmer_data.cpp : 315) Processing /home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz 0:43:09.028 41G / 41G INFO K-mer Counting (kmer_data.cpp : 324) Processed 40344553 reads 0:43:09.028 41G / 41G INFO K-mer Counting (kmer_data.cpp : 315) Processing /home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz 1:18:15.630 41G / 41G INFO K-mer Counting (kmer_data.cpp : 324) Processed 80689106 reads 1:18:15.631 41G / 41G INFO K-mer Counting (kmer_data.cpp : 329) Total 80689106 reads processed 1:18:15.715 41G / 41G INFO General (kmer_index_builder.hpp : 243) Splitting kmer instances into 16 files using 10 threads. This might take a while. 1:18:15.766 41G / 41G INFO General (file_limit.hpp : 32) Open file limit set to 32768 1:18:15.767 41G / 41G INFO General (kmer_splitter.hpp : 93) Memory available for splitting buffers: 4.31535 Gb 1:18:15.767 41G / 41G INFO General (kmer_splitter.hpp : 101) Using cell size of 4194304 1:18:15.768 47G / 47G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz 1:18:49.739 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 3798656 reads 1:19:23.586 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 7545884 reads 1:19:57.643 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 11286621 reads 1:20:31.129 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 15042003 reads 1:21:05.370 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 18834601 reads 1:21:39.848 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 22655390 reads 1:22:14.416 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 26449038 reads 1:22:49.498 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 30229299 reads 1:23:23.614 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 34006617 reads 1:23:58.235 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 37802134 reads 1:24:22.135 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 40344553 reads 1:24:22.135 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz 1:28:54.828 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 70439748 reads 1:30:27.541 47G / 48G INFO K-mer Splitting (kmer_data.cpp : 112) Total 80689106 reads processed 1:30:27.543 41G / 48G INFO General (kmer_index_builder.hpp : 249) Starting k-mer counting. 1:32:47.731 41G / 81G INFO General (kmer_index_builder.hpp : 260) K-mer counting done. There are 4383899052 kmers in total. 1:32:51.781 37M / 81G INFO K-mer Index Building (kmer_index_builder.hpp : 395) Building perfect hash indices 1:37:59.161 3102M / 81G INFO K-mer Index Building (kmer_index_builder.hpp : 431) Index built. Total 4383899052 kmers, 3166345000 bytes occupied (5.77813 bits per kmer). 1:37:59.162 3102M / 81G INFO K-mer Counting (kmer_data.cpp : 354) Arranging kmers in hash map order 1:40:48.877 69G / 81G INFO General (main.cpp : 148) Clustering Hamming graph. 2:48:00.426 69G / 81G INFO General (main.cpp : 155) Extracting clusters: 2:48:00.426 69G / 81G INFO General (concurrent_dsu.cpp : 18) Connecting to root 2:48:17.586 69G / 81G INFO General (concurrent_dsu.cpp : 34) Calculating counts

== Error == system call for: "['/home/projects/ku_00041/apps/wanli/v1.4.1-plaspline/Plaspline/conda_envs/f86ddc1d/bin/spades-hammer', '/home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/corrected/configs/config.info']" finished abnormally, OS return value: -9 None

In case you have troubles running SPAdes, you can write to spades.support@cab.spbu.ru or report an issue on our GitHub repository github.com/ablab/spades Please provide us with params.txt and spades.log files from the output directory.

SPAdes log can be found here: /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/spades.log

Thank you for using SPAdes!

params.txt

Command line: /home/projects/ku_00041/apps/wanli/v1.4.1-plaspline/Plaspline/conda_envs/f86ddc1d/bin/spades.py --meta -1 /home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz -2 /home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz -t 10 -m 170 -k 21,33,55,77 -o /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res

System information: SPAdes version: 3.15.3 Python version: 3.9.0 OS: Linux-3.10.0-1062.4.1.el7.x86_64-x86_64-with-glibc2.17

Output dir: /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Metagenomic mode Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/home/projects/ku_00041/archive/EBI_soil/ERR9924925_1.fastq.gz'] right reads: ['/home/projects/ku_00041/archive/EBI_soil/ERR9924925_2.fastq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset will be auto-detected Corrected reads will be compressed Assembly parameters: k: [21, 33, 55, 77] Repeat resolution is enabled Mismatch careful mode is turned OFF MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF Other parameters: Dir for temp files: /home/projects/ku_00041/archive/EBI_soil/assembly/ERR9924925_assembly_res/tmp Threads: 10 Memory limit (in Gb): 170

SPAdes version

3.15.3

Operating System

linux

Python Version

3.9.0

Method of SPAdes installation

conda

No errors reported in spades.log

asl commented 3 weeks ago

You SPAdes job was killed. You might want to talk to your system administrator about the issue.

PS: You might also want to give the latest SPAdes 4.0 a try

felipehcoutinho commented 2 weeks ago

Hi there! I have been getting the same error while running SPADES within a SLURM script, which I did in two different servers. When checking the logs, there is no indication of SLURM killing any of the jobs,

Also, running seff confirmed the job did not run out of memory:

Nodes: 1 Cores per node: 24 CPU Utilized: 36-00:03:36 CPU Efficiency: 75.00% of 48-00:06:24 core-walltime Job Wall-clock time: 2-00:00:16 Memory Utilized: 699.87 GB Memory Efficiency: 93.32% of 750.00 GB

Plus the SLURM job log does not include any error or mentions to a sigkill

I am running a batch of samples one after the other, and I noticed the error always shows up during the sequence_mapper_notifier.h stage for every sample:

7:16:06.035 682G / 682G INFO General (sequence_mapper_notifier.h: 80) Processed 16780715 reads

== Error == system call for: "['/opt/ohpc/pub/apps/spades/3.13.0/bin/spades-core', '/mnt/smart/users/fcoutinho/OctoMicro/WGS_Assemblies/Assembly_A10b/K55/configs/config.info', '/mnt/smart/users/fcoutinho/OctoMicro/WGS_Assemblies/Assembly_A10b/K55/configs/mda_mode.info', '/mnt/smart/users/fcoutinho/OctoMicro/WGS_Assemblies/Assembly_A10b/K55/configs/meta_mode.info']" finished abnormally, err code: -9

Could it be that some sub subprocess is being killed in the background because it is using more than the allocated memory and that SLURM kills that but allows the main process to continue running? if so is there a way to find out which? I tried allocating as much as 1.5T of memory but still got the same error. Appreciate the help!

params:

Command line: /opt/ohpc/pub/apps/spades/3.13.0/bin/spades.py -1 /mnt/smart/users/fcoutinho/OctoMicro/WGS_Clean_Reads/Clean_Unknown_BL327-004R0004_1.fq.gz -2 /mnt/smart/users/fcoutinho/OctoMicro/WGS_Clean_Reads/Clean_Unknown_BL327-004R0004_2.fq.gz -o /mnt/smart/users/fcoutinho/OctoMicro/WGS_Assemblies/Assembly_A020 --threads 47 --memory 730 --meta

System information: SPAdes version: 3.13.0 Python version: 3.8.5 OS: Linux-3.10.0-862.3.2.el7.x86_64-x86_64-with-glibc2.17

Output dir: /mnt/smart/users/fcoutinho/OctoMicro/WGS_Assemblies/Assembly_A020 Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Metagenomic mode Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/mnt/smart/users/fcoutinho/OctoMicro/WGS_Clean_Reads/Clean_Unknown_BL327-004R0004_1.fq.gz'] right reads: ['/mnt/smart/users/fcoutinho/OctoMicro/WGS_Clean_Reads/Clean_Unknown_BL327-004R0004_2.fq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset will be auto-detected Corrected reads will be compressed Assembly parameters: k: [21, 33, 55] Repeat resolution is enabled Mismatch careful mode is turned OFF MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF Other parameters: Dir for temp files: /mnt/smart/users/fcoutinho/OctoMicro/WGS_Assemblies/Assembly_A020/tmp Threads: 47 Memory limit (in Gb): 730

asl commented 2 weeks ago

Hi there! I have been getting the same error while running SPADES within a SLURM script,

Please do not hijack the other issues, your problem might be entirely different.

Could it be that some sub subprocess is being killed in the background because it is using more than the allocated memory and that SLURM kills that but allows the main process to continue running? if so is there a way to find out which? I tried allocating as much as 1.5T of memory but still got the same error.

SPAdes does not spawn any subprocess at this moment. So it is SPAdes process itself that got killed from outside.

Note that SPAdes 3.13 is ancient. You might want to give SPAdes 4.0 a try.

daidiy1109 commented 1 week ago

so how to solve this problem. == Error == system call for: "['/public5/home/t6s000894/spades-spades_4.0.0/install/bin/spades-hammer', '/public5/home/t6s000894/test/spades_output4/corrected/configs/config.info']" finished abnormally, OS return value: -9 None

In the out file of the last: slurmstepd: error: Detected 1 oom_kill event in StepId=18172727.batch. Some of the step tasks have been OOM Killed.

asl commented 1 week ago

so how to solve this problem. == Error == system call for: "['/public5/home/t6s000894/spades-spades_4.0.0/install/bin/spades-hammer', '/public5/home/t6s000894/test/spades_output4/corrected/configs/config.info']" finished abnormally, OS return value: -9 None

In the out file of the last: slurmstepd: error: Detected 1 oom_kill event in StepId=18172727.batch. Some of the step tasks have been OOM Killed.

You SPAdes job was kiiled due to out of memory as it was clearly reported in the message you posted