ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
750 stars 135 forks source link

option #717

Closed DaanJansen94 closed 3 years ago

DaanJansen94 commented 3 years ago

Dear,

We want to assemble viral contigs out of metagenomic data. We used Metaspades but still saw quite some fragmentation, especially for the viral contigs that have a high coverage. This is probably cause by our amplification methods in lab, together with high heterogeneity of the contigs itself. I tried to solve it by running Spades with different option which seems to solve the issue to some degree. So I combined (I) Metaspades output with (II) Spades --isolate --cov-cutoff '100 output and then clustered it to remove redundancy of overlapping contigs & obtain a unique contigs set as ouput of this combination. Now my questions are:

(1) I know isolate mode was created for high coverage isolates, but it seems to work also on viral metagenomic data to obtain less fragmented viral contigs? Does this make sense you as authors of the program? or do you suggest do handle the problem differently? (2) I also wanted to evaluate Metaviralspades, although I'm not interesting in the viral identify step. Is there are way to solely run the assembly without anything else of identification? I ran the complete Metaviralspades option on my data, but would lose a lot of viral contigs (that I know because we have our own identification schemes), perhaps with removing the identification step it would make more sense. (3) if you have an advice for that problem, that would be great!

Thank you in advance,

Daan

asl commented 3 years ago

Hello

Tagging @Dmitry-Antipov

Note that metaviralSPAdes does not include ViralIdentify step – you'd need to run it separately.

Dmitry-Antipov commented 3 years ago

Yes, identification is not included into metaviralspades.py - it's released independently, https://github.com/ablab/viralVerify

metaviralspades is focused on complete assemblies - if the virus is not assembled in one contig (because of repeats or relative strains or whatever) then the viral sequence will likely be lost.