Closed snayfach closed 3 years ago
Tagging @Dmitry-Antipov
Hi. Not sure that I understand your question correctly - what exactly do you want to exclude from metaviralSPAdes output?
BTW, we are thinking about adding additional information about potential TDR's location in circular contigs which may correspond to linear viruses with large TDR - such cases can be determined with read coverage. But this is not implemented yet.
Sorry, let me rephrase. In the MegaHIT output FASTA file, contigs are labeled with a flag of (3=cycle, 2=unconnected linear, and 0=connected linear). I was wondering if there was an easy way of extracting similar information from the meta(viral)SPAdes output. I'm looking for circular contigs and would like to exclude anything that was linear or connected to another contig in the assembly graph.
Regarding the latter point, labeling the start location on circular contigs by read mapping is a great idea. Even better, you could use this information to set the cut point by the assembler so the ends of the contig correspond to the true genome start/stop. Right now an additional step is required to rotate the circular sequence. I've tried the read mapping analysis, and in my experience, you can often clearly see a single position in the genome where there is a massive enrichment of read starting points -- this likely corresponds to the end of a linear genome that has been circularized by the assembler.
Thanks, Stephen
I'm looking for circular contigs and would like to exclude anything that was linear or connected to another contig in the assembly graph.
For the metaviral pipeline we output the information whether contig is circular in .fasta headers - you can search for "type_circular". To exclude anything that was connected to other contigs in the assembly graph you should also use only contigs with "_cutoff_0" in their headers, but with this metaviralSPAdes will become nearly equivalent to regular SPAdes with some options tweaked.
Got it, thanks!
I'm using MetaviralSPAdes to identify circular viral contigs that contain direct terminal repeats (DTRs). My understanding is that DTRs can occur as a result of a cycle in the graph as well as a bubble (due to a repetitive sequence). Is there an easy way to identify and exclude the latter from the (Meta)viralSPAdes output?
Thanks, Stephen