ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
713 stars 131 forks source link

SPAdes not perform correctly issue with spades-hammer #1268

Closed manotush closed 3 months ago

manotush commented 3 months ago

Description of bug

When I run the spades program to conduct genome assembly , every time I face one issue is spades-hammer. Please help me to troubleshoot that issue

spades.log

Command line: /home/manotush/anaconda3/bin/spades.py -o /home/manotush/raw_sequence/spades_assembly -1 /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz -2 /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz

System information: SPAdes version: 3.15.5 Python version: 3.11.5 OS: Linux-6.5.0-25-generic-x86_64-with-glibc2.35

Output dir: /home/manotush/raw_sequence/spades_assembly Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Standard mode For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'. Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz'] right reads: ['/home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset will be auto-detected Corrected reads will be compressed Assembly parameters: k: automatic selection based on read length Repeat resolution is enabled Mismatch careful mode is turned OFF MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF Other parameters: Dir for temp files: /home/manotush/raw_sequence/spades_assembly/tmp Threads: 16 Memory limit (in Gb): 7

======= SPAdes pipeline started. Log can be found here: /home/manotush/raw_sequence/spades_assembly/spades.log

/home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz: max reads length: 231 /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz: max reads length: 231

Reads length: 231

Default k-mer sizes were set to [21, 33, 55, 77] because estimated read length (231) is equal to or greater than 150

===== Before start started.

===== Read error correction started.

===== Read error correction started.

== Running: /home/manotush/anaconda3/bin/spades-hammer /home/manotush/raw_sequence/spades_assembly/corrected/configs/config.info

0:00:00.000 1M / 26M INFO General (main.cpp : 75) Starting BayesHammer, built from N/A, git revision N/A 0:00:00.003 1M / 26M INFO General (main.cpp : 76) Loading config from /home/manotush/raw_sequence/spades_assembly/corrected/configs/config.info 0:00:00.009 1M / 26M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 8 0:00:00.010 1M / 26M INFO General (memory_limit.cpp : 54) Memory limit set to 7 Gb 0:00:00.010 1M / 26M INFO General (main.cpp : 86) Trying to determine PHRED offset 0:00:00.010 1M / 26M INFO General (main.cpp : 92) Determined value is 33 0:00:00.011 1M / 26M INFO General (hammer_tools.cpp : 38) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ] 0:00:00.011 1M / 26M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes === ITERATION 0 begins === 0:00:00.011 1M / 26M INFO General (kmer_index_builder.hpp : 243) Splitting kmer instances into 16 files using 8 threads. This might take a while. 0:00:00.012 1M / 26M INFO General (file_limit.hpp : 42) Open file limit set to 1024 0:00:00.012 1M / 26M INFO General (kmer_splitter.hpp : 93) Memory available for splitting buffers: 0.291663 Gb 0:00:00.012 1M / 26M INFO General (kmer_splitter.hpp : 101) Using cell size of 2446644 0:00:00.013 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz 0:00:09.921 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 526559 reads 0:00:09.922 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz 0:00:19.722 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 1050593 reads 0:00:19.722 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 112) Total 1050593 reads processed 0:00:19.722 1M / 2380M INFO General (kmer_index_builder.hpp : 249) Starting k-mer counting. 0:00:19.952 1M / 2380M INFO General (kmer_index_builder.hpp : 260) K-mer counting done. There are 16965854 kmers in total. 0:00:19.952 1M / 2380M INFO K-mer Index Building (kmer_index_builder.hpp : 395) Building perfect hash indices 0:00:20.538 15M / 2380M INFO K-mer Index Building (kmer_index_builder.hpp : 431) Index built. Total 16965854 kmers, 12265608 bytes occupied (5.78367 bits per kmer). 0:00:20.539 15M / 2380M INFO K-mer Counting (kmer_data.cpp : 354) Arranging kmers in hash map order 0:00:21.235 279M / 2380M INFO General (main.cpp : 148) Clustering Hamming graph. 0:00:42.646 279M / 2380M INFO General (main.cpp : 155) Extracting clusters: 0:00:42.647 279M / 2380M INFO General (concurrent_dsu.cpp : 18) Connecting to root 0:00:42.759 279M / 2380M INFO General (concurrent_dsu.cpp : 34) Calculating counts 0:00:46.780 552M / 2380M INFO General (concurrent_dsu.cpp : 63) Writing down entries 0:00:52.199 279M / 2380M INFO General (main.cpp : 167) Clustering done. Total clusters: 9758206 0:00:52.216 147M / 2380M INFO K-mer Counting (kmer_data.cpp : 371) Collecting K-mer information, this takes a while. 0:00:52.480 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 377) Processing /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz 0:01:13.399 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 377) Processing /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz 0:01:34.300 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 384) Collection done, postprocessing. 0:01:34.360 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 397) There are 16965854 kmers in total. Among them 8733374 (51.4762%) are singletons. 0:01:34.360 539M / 2380M INFO General (main.cpp : 173) Subclustering Hamming graph 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 650) Subclustering done. Total 9 non-read kmers were generated. 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 651) Subclustering statistics: 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 652) Total singleton hamming clusters: 5343380. Among them 3321826 (62.1671%) are good 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 653) Total singleton subclusters: 33485. Among them 33312 (99.4834%) are good 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 654) Total non-singleton subcluster centers: 4437459. Among them 2727498 (61.4653%) are good 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 655) Average size of non-trivial subcluster: 2.61918 kmers 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 656) Average number of sub-clusters per non-singleton cluster: 1.01271 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 657) Total solid k-mers: 6082636 0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 658) Substitution probabilities: 4,4 0:01:40.134 539M / 2380M INFO General (main.cpp : 178) Finished clustering. 0:01:40.135 539M / 2380M INFO General (main.cpp : 197) Starting solid k-mers expansion in 8 threads. 0:01:52.215 539M / 2380M INFO General (main.cpp : 218) Solid k-mers iteration 0 produced 17700 new k-mers. 0:02:04.309 539M / 2380M INFO General (main.cpp : 218) Solid k-mers iteration 1 produced 449 new k-mers. 0:02:16.488 539M / 2380M INFO General (main.cpp : 218) Solid k-mers iteration 2 produced 0 new k-mers. 0:02:16.488 539M / 2380M INFO General (main.cpp : 222) Solid k-mers finalized 0:02:16.488 539M / 2380M INFO General (hammer_tools.cpp : 222) Starting read correction in 8 threads. 0:02:16.488 539M / 2380M INFO General (hammer_tools.cpp : 235) Correcting pair of reads: /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz and /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz 0:02:20.597 1260M / 2380M INFO General (hammer_tools.cpp : 170) Prepared batch 0 of 524034 reads. 0:02:33.719 1308M / 2380M INFO General (hammer_tools.cpp : 177) Processed batch 0 0:02:34.871 1308M / 2380M INFO General (hammer_tools.cpp : 187) Written batch 0 0:02:34.871 1308M / 2380M ERROR General (hammer_tools.cpp : 191) Pair of read files /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz and /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz contain unequal amount of reads

== Error == system call for: "['/home/manotush/anaconda3/bin/spades-hammer', '/home/manotush/raw_sequence/spades_assembly/corrected/configs/config.info']" finished abnormally, OS return value: 21 None

In case you have troubles running SPAdes, you can write to spades.support@cab.spbu.ru or report an issue on our GitHub repository github.com/ablab/spades Please provide us with params.txt and spades.log files from the output directory.

SPAdes log can be found here: /home/manotush/raw_sequence/spades_assembly/spades.log

Thank you for using SPAdes!

params.txt

(base) manotush@manotush-pc:~/raw_sequence$ spades.py -o spades_assembly -1 74A_S28_L001_R1_001_trim_final.fastq.gz -2 74A_S28_L001_R2_001_trim_final.fastq.gz

== Warning == No assembly mode was specified! If you intend to assemble high-coverage multi-cell/isolate data, use '--isolate' option.

Command line: /home/manotush/anaconda3/bin/spades.py -o /home/manotush/raw_sequence/spades_assembly -1 /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz -2 /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz

System information: SPAdes version: 3.15.5 Python version: 3.11.5 OS: Linux-6.5.0-25-generic-x86_64-with-glibc2.35

Output dir: /home/manotush/raw_sequence/spades_assembly Mode: read error correction and assembling Debug mode is turned OFF

Dataset parameters: Standard mode For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'. Reads: Library number: 1, library type: paired-end orientation: fr left reads: ['/home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz'] right reads: ['/home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz'] interlaced reads: not specified single reads: not specified merged reads: not specified Read error correction parameters: Iterations: 1 PHRED offset will be auto-detected Corrected reads will be compressed Assembly parameters: k: automatic selection based on read length Repeat resolution is enabled Mismatch careful mode is turned OFF MismatchCorrector will be SKIPPED Coverage cutoff is turned OFF Other parameters: Dir for temp files: /home/manotush/raw_sequence/spades_assembly/tmp Threads: 16 Memory limit (in Gb): 7

SPAdes version

SPAdes v3.15.5

Operating System

Linux-6.5.0-25-generic-x86_64-with-glibc2.35

Python Version

Python 3.11.5

Method of SPAdes installation

conda

No errors reported in spades.log

asl commented 3 months ago

The log clearly reads:

0:02:34.871 1308M / 2380M ERROR General (hammer_tools.cpp : 191) Pair of read files /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz and /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz contain unequal amount of reads

So your input files are corrupted, they do not contain proper set of paired-end reads.