ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
752 stars 135 forks source link

Warnings using metaspades #558

Open mariadelmarq opened 4 years ago

mariadelmarq commented 4 years ago

I'm trying to assemble some publicly available metagenomic data using metaspades. The data is here: https://www.omicsdi.org/dataset/omics_ena_project/PRJNA379494. I'm testing the first set of paired-end reads: SRR5351712_1 and SRR5351712_2.

I was able to assemble them using other assemblers with no issues, but when I run metaspades.py on them (metaspades.py -1 SRR5351712_1.fastq.gz -2 SRR5351712_2.fastq.gz), I get a series of warnings that suggest the paired-end reads are corrupted:

======= SPAdes pipeline finished WITH WARNINGS!

=== Error correction and assembling warnings:
 * 0:00:18.120   247M / 919M  WARN    General                 (pair_info_count.cpp       : 341)   Unable to estimate insert size for paired library #0
 * 0:00:18.121   247M / 919M  WARN    General                 (pair_info_count.cpp       : 347)   None of paired reads aligned properly. Please, check orientation of your read pairs.
 * 0:00:18.122   247M / 919M  WARN    General                 (repeat_resolving.cpp      :  63)   Insert size was not estimated for any of the paired libraries, repeat resolution module will not run.
 * 0:00:27.275   235M / 919M  WARN    General                 (pair_info_count.cpp       : 175)   Single reads are not used in metagenomic mode
=======

I then tried with regular spades (spades.py -1 SRR5351712_1.fastq.gz -2 SRR5351712_2.fastq.gz), and get a different warning:

=== Error correction and assembling warnings:
 * 0:00:03.999   297M / 505M  WARN    General                 (kmer_coverage_model.cpp   : 218)   Too many erroneous kmers, the estimates might be unreliable
=======

Is it that the files are corrupted in a way that megahit, for example, is unable to pick up on, or is there a compatibility issue between these files and spades? I've tried both the raw files and trimmed files (using trimmomatic), same warnings in both cases.

Here are the log and param files for the metaspades assembly, let me know if you'd like me to send through the spades ones as well. params.txt spades.log

Thanks!

asl commented 4 years ago

Hello

It does not look like a metagenomic dataset: it is very small (both in terms of # of reads and the genome size), however the average coverage is very large. So, I would suspect there is something wrong with this dataset.

mariadelmarq commented 4 years ago

Thanks, @asl! Do you happen to know why metaspades picks up something weird in terms of the paired-end reads, whereas spades doesn't agree?

Weirdly enough, this dataset certainly claims to be metagenome data (https://www.ebi.ac.uk/ena/browser/view/PRJNA379494) and it forms the basis for a publication in Scientific Reports: https://www.nature.com/articles/s41598-017-06404-8.

kmkappa commented 1 year ago

Hi @mariadelmarq ! did you manage to solve your issue? I am facing the same situation. any ideas ? image

asl commented 1 year ago

@kmkappa Please do not hijack unrelated issues, open a new one

kmkappa commented 1 year ago

@asl as you prefer. please find the same problem occurred on my machine under #1110 issue