ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
763 stars 138 forks source link

Issue assembling contigs; spades-hammer error #1113

Open Deco313 opened 1 year ago

Deco313 commented 1 year ago

Description of bug

HI, I am trying to assemble genome skim data, on spades (have tried both installing through conda and through downloading source code and compiling) and every time I do, I get an error

== Error == system call for: "['/ichec/work/nglif049b/miniconda3/envs/spades/share/spades-3.13.0-0/bin/spades-hammer', '/ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info']" finished abnormally, err code: -7

I have seen this issue reported here before, and people have solved it by running the command as a job, or by setting the phred offset, or by adjusting the memory. I have tried all these options and still have had no luck.

!/bin/sh

SBATCH --time=144:00:00

SBATCH --nodes=1

SBATCH -A nglif049b

eval "$(conda shell.bash hook)" conda activate spades

cat bamboo_list.txt | while read line; do cd $line

spades.py -o /ichec/work/nglif049b/Mitogenomes/cleaned/$line -1 /ichec/work/nglif049b/Mitogenomes/raw/$line/$line-READ1.fq.gz -2 /ichec/work/nglif049b/Mitogenomes/raw/$line/$line-READ2.fq.gz --memory 50 --cov-cutoff 2 --careful --threads 15 --phred-offset 33
done

bamboo_list.txt is a list of specimen names and spades has no problem finding the reads.

I am running this on a HPC system. Info on that system can be found here if you need it https://www.ichec.ie/

spades.log

Command line: /ichec/work/nglif049b/miniconda3/envs/spades/bin/spades.py -o /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 -1 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz -2 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz --memory 50 --cov-cutoff 2 --careful --threads 15 --phred-offset 33

System information: SPAdes version: 3.13.0 Python version: 3.9.16 OS: Linux-3.10.0-957.27.2.el7.764g0000.x86_64-x86_64-with-glibc2.17

Output dir: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology or --meta flag if processing metagenomic dataset) Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz'] right reads: ['/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset: 33 Corrected reads will be compressed Assembly parameters: k: automatic selection based on read length Repeat resolution is enabled Mismatch careful mode is turned ON MismatchCorrector will be used Coverage cutoff is turned ON and threshold is 2.0 Other parameters: Dir for temp files: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/tmp Threads: 15 Memory limit (in Gb): 50

======= SPAdes pipeline started. Log can be found here: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/spades.log

===== Read error correction started.

== Running read error correction tool: /ichec/work/nglif049b/miniconda3/envs/spades/share/spades-3.13.0-0/bin/spades-hammer /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info

0:00:00.012 4M / 4M INFO General (main.cpp : 75) Starting BayesHammer, built from refs/heads/spades_3.13.0, git revision 8ea46659e9b2aca35444a808db550ac333006f8b 0:00:00.046 4M / 4M INFO General (main.cpp : 76) Loading config from /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info 0:00:00.091 4M / 4M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 15 0:00:00.095 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 50 Gb 0:00:00.119 4M / 4M INFO General (hammer_tools.cpp : 36) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ] 0:00:00.119 4M / 4M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes === ITERATION 0 begins === 0:00:00.140 4M / 4M INFO K-mer Index Building (kmer_index_builder.hpp : 301) Building kmer index 0:00:00.143 4M / 4M INFO General (kmer_index_builder.hpp : 117) Splitting kmer instances into 240 files using 15 threads. This might take a while. 0:00:00.153 4M / 4M INFO General (file_limit.hpp : 32) Open file limit set to 10240 0:00:00.153 4M / 4M INFO General (kmer_splitters.hpp : 89) Memory available for splitting buffers: 1.11102 Gb 0:00:00.153 4M / 4M INFO General (kmer_splitters.hpp : 97) Using cell size of 279620 0:00:01.687 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz 0:00:31.115 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 3233987 reads 0:00:37.689 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 4023981 reads 0:00:37.689 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz 0:01:05.280 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 7238644 reads 0:01:12.356 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 8047962 reads 0:01:12.369 8G / 8G INFO K-mer Splitting (kmer_data.cpp : 112) Total 8047962 reads processed 0:01:13.294 60M / 8G INFO General (kmer_index_builder.hpp : 120) Starting k-mer counting. 0:01:19.064 60M / 8G INFO General (kmer_index_builder.hpp : 127) K-mer counting done. There are 602467078 kmers in total. 0:01:19.064 60M / 8G INFO General (kmer_index_builder.hpp : 133) Merging temporary buckets. 0:01:39.593 60M / 8G INFO K-mer Index Building (kmer_index_builder.hpp : 314) Building perfect hash indices 0:02:05.932 384M / 8G INFO General (kmer_index_builder.hpp : 150) Merging final buckets. 0:02:14.091 384M / 8G INFO K-mer Index Building (kmer_index_builder.hpp : 336) Index built. Total 279377302 bytes occupied (3.70978 bits per kmer). 0:02:14.110 384M / 8G INFO K-mer Counting (kmer_data.cpp : 356) Arranging kmers in hash map order 0:02:39.963 9G / 9G INFO General (main.cpp : 148) Clustering Hamming graph. 0:12:19.006 9G / 9G INFO General (main.cpp : 155) Extracting clusters

== Error == system call for: "['/ichec/work/nglif049b/miniconda3/envs/spades/share/spades-3.13.0-0/bin/spades-hammer', '/ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info']" finished abnormally, err code: -7

In case you have troubles running SPAdes, you can write to spades.support@cab.spbu.ru or report an issue on our GitHub repository github.com/ablab/spades Please provide us with params.txt and spades.log files from the output directory.

params.txt

Command line: /ichec/work/nglif049b/miniconda3/envs/spades/bin/spades.py -o /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 -1 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz -2 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz --memory 50 --cov-cutoff 2 --careful --threads 15 --phred-offset 33

System information: SPAdes version: 3.13.0 Python version: 3.9.16 OS: Linux-3.10.0-957.27.2.el7.764g0000.x86_64-x86_64-with-glibc2.17

Output dir: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Multi-cell mode (you should set '--sc' flag if input data was obtained with MDA (single-cell) technology or --meta flag if processing metagenomic dataset) Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz'] right reads: ['/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset: 33 Corrected reads will be compressed Assembly parameters: k: automatic selection based on read length Repeat resolution is enabled Mismatch careful mode is turned ON MismatchCorrector will be used Coverage cutoff is turned ON and threshold is 2.0 Other parameters: Dir for temp files: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/tmp Threads: 15 Memory limit (in Gb): 50

SPAdes version

3.13.0

Operating System

Linux-3.10.0-957.27.2.el7.764g0000.x86_64-x86_64-with-glibc2.17

Python Version

3.9.16

Method of SPAdes installation

conda

No errors reported in spades.log

asl commented 1 year ago

Hello

SPAdes 3.13.0 is ancient. Will you please give the latest SPAdes 3.15.4 a try?

Deco313 commented 1 year ago

Hi, I tried again using Spades 3.15.4 and got the same error.

Command line: /ichec/work/nglif049b/SPAdes-3.15.5-Linux/bin/spades.py -o /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 -1 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz -2 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz --memory 50 --cov-cutoff 2 --careful --threads 15 --phred-offset 33

System information: SPAdes version: 3.15.5 Python version: 3.9.16 OS: Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17

Output dir: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Standard mode For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'. Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz'] right reads: ['/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset: 33 Corrected reads will be compressed Assembly parameters: k: automatic selection based on read length Repeat resolution is enabled Mismatch careful mode is turned ON MismatchCorrector will be used Coverage cutoff is turned ON and threshold is 2.000000 Other parameters: Dir for temp files: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/tmp Threads: 15 Memory limit (in Gb): 50

======= SPAdes pipeline started. Log can be found here: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/spades.log

/ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz: max reads length: 150 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz: max reads length: 150

Reads length: 150

Default k-mer sizes were set to [21, 33, 55, 77] because estimated read length (150) is equal to or greater than 150

===== Before start started.

===== Read error correction started.

===== Read error correction started.

== Running: /ichec/work/nglif049b/SPAdes-3.15.5-Linux/bin/spades-hammer /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info

0:00:00.005 1M / 29M INFO General (main.cpp : 75) Starting BayesHammer, built from refs/heads/spades_3.15.5, git revision e757b8216f9a038fb616e9551a2d4891b2d19ad7 0:00:00.022 1M / 29M INFO General (main.cpp : 76) Loading config from /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info 0:00:00.080 1M / 29M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 15 0:00:00.082 1M / 29M INFO General (memory_limit.cpp : 54) Memory limit set to 50 Gb 0:00:00.094 1M / 29M INFO General (hammer_tools.cpp : 38) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ] 0:00:00.100 1M / 29M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes === ITERATION 0 begins === 0:00:00.142 1M / 29M INFO General (kmer_index_builder.hpp : 243) Splitting kmer instances into 16 files using 15 threads. This might take a while. 0:00:00.164 1M / 29M INFO General (file_limit.hpp : 42) Open file limit set to 10240 0:00:00.164 1M / 29M INFO General (kmer_splitter.hpp : 93) Memory available for splitting buffers: 1.11111 Gb 0:00:00.164 1M / 29M INFO General (kmer_splitter.hpp : 101) Using cell size of 4194304 0:00:00.286 8641M / 8641M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz 0:00:38.712 8641M / 13G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 3419223 reads 0:00:46.578 8641M / 13G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 4023981 reads 0:00:46.582 8641M / 13G INFO K-mer Splitting (kmer_data.cpp : 97) Processing /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz 0:01:23.002 8641M / 13G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 7383265 reads 0:01:32.009 8641M / 13G INFO K-mer Splitting (kmer_data.cpp : 107) Processed 8047962 reads 0:01:32.028 8641M / 13G INFO K-mer Splitting (kmer_data.cpp : 112) Total 8047962 reads processed 0:01:32.029 1M / 13G INFO General (kmer_index_builder.hpp : 249) Starting k-mer counting. 0:01:44.299 1M / 13G INFO General (kmer_index_builder.hpp : 260) K-mer counting done. There are 602467078 kmers in total. 0:01:44.315 1M / 13G INFO K-mer Index Building (kmer_index_builder.hpp : 395) Building perfect hash indices 0:02:02.599 420M / 13G INFO K-mer Index Building (kmer_index_builder.hpp : 431) Index built. Total 602467078 kmers, 435151784 bytes occupied (5.77826 bits per kmer). 0:02:02.603 420M / 13G INFO K-mer Counting (kmer_data.cpp : 354) Arranging kmers in hash map order 0:02:29.139 9620M / 13G INFO General (main.cpp : 148) Clustering Hamming graph. 0:12:26.264 9620M / 13G INFO General (main.cpp : 155) Extracting clusters: 0:12:26.264 9620M / 13G INFO General (concurrent_dsu.cpp : 18) Connecting to root 0:12:27.767 9620M / 13G INFO General (concurrent_dsu.cpp : 34) Calculating counts 0:16:57.208 18G / 19G INFO General (concurrent_dsu.cpp : 63) Writing down entries

== Error == system call for: "['/ichec/work/nglif049b/SPAdes-3.15.5-Linux/bin/spades-hammer', '/ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/corrected/configs/config.info']" finished abnormally, OS return value: -7 None

In case you have troubles running SPAdes, you can write to spades.support@cab.spbu.ru or report an issue on our GitHub repository github.com/ablab/spades Please provide us with params.txt and spades.log files from the output directory.

SPAdes log can be found here: /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001/spades.log

Thank you for using SPAdes!

asl commented 1 year ago

Well, but in different place. I'd suspect you're running out of RAM

Deco313 commented 1 year ago

Hi so it is an issue with RAM, see below the response from my system administrator.

I had specified 20 threads, so would that be the 20 instances of spades-hammer. This dataset I am running is only 15 samples each with less than 8M reads each.

Is there something I am doing wrong?

Hi Declan,

OK that is good. I am not concerned about the fact that CPU is 100%, that is what you want to achieve. I am more concerned that you are only using 1 core out of 40. More concerning is that there are 20 instances of spades-hammer each using 21.7G of memory. That means that together they would be using 434GB of memory. Given that a single node on the thin partition has ~180GB, I can see why this might not work, although that should be fine for the high-memory partition. Is there some way to control how many instances of spades are running?

asl commented 1 year ago

I had specified 20 threads, so would that be the 20 instances of spades-hammer. This dataset I am running is only 15 samples each with less than 8M reads each.

So, why are you running 20 instances? Maybe you need only one?

Deco313 commented 1 year ago

My question exactly This was just the specified threads. Should I lower the threads just to one?

On Wed 29 Mar 2023, 14:48 Anton Korobeynikov, @.***> wrote:

I had specified 20 threads, so would that be the 20 instances of spades-hammer. This dataset I am running is only 15 samples each with less than 8M reads each.

So, why are you running 20 instances? Maybe you need only one?

— Reply to this email directly, view it on GitHub https://github.com/ablab/spades/issues/1113#issuecomment-1489125343, or unsubscribe https://github.com/notifications/unsubscribe-auth/APOLDKGZMLOQTR7PHAT6UZDW6R7XJANCNFSM6AAAAAAWFEO244 . You are receiving this because you authored the thread.Message ID: @.***>

asl commented 1 year ago

I have no idea what does "thread" mean in the case of your cluster and job scheduler. You'd likely need to discuss this with your system administrator.

Deco313 commented 1 year ago

And I do appreciate your help. My system administrator wanted to know why my code would run 20 instances of spades hammer. By threads, I mean the --threads command in Spades, which was set to 20 at the time. the code first executed (this one was not --threads 20) was

/ichec/work/nglif049b/SPAdes-3.15.5-Linux/bin/spades.py -o /ichec/work/nglif049b/Mitogenomes/cleaned/BAM001 -1 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ1.fq.gz -2 /ichec/work/nglif049b/Mitogenomes/raw/BAM001/BAM001-READ2.fq.gz --memory 50 --cov-cutoff 2 --careful --threads 15 --phred-offset 33 If you think this is unrelated, we can just close this.

Thank you again.

asl commented 1 year ago

Well, you're clearly using --threads 15 value. Both in cmdline and SPAdes was using 15 OpenMP threads.

Deco313 commented 1 year ago

yes, I have tried increasing and decreasing the number of threads, but it still gets caught. So for the time, I tried it on a high memory node with 1.5 TB RAM, I used --threads 20 and it still failed. I am just wondering if the number of threads was correlated with the number of times spades hammer was running on that job.

asl commented 1 year ago

SPAdes is always single instance (though could certainly utilize multiple threads on a single node). So, if you're seeing 20 instances, then something else is running that 20-30-40 instances for you.