Closed ecampbell50 closed 1 year ago
Hi @ecampbell50. It's great to hear that geNomad has been useful for you!
Are you referring to the taxonomy
column of the _virus_summary.tsv
file? If so, all the sequences listed in that file were classified as viral. Sequences with an "Unclassified" value in that column were not be assigned to any virus taxon, but are likely viral regardless.
Yes sorry I should've specified the file! Thanks for clarifying the 'Unclassified' part. I was wondering what the "Viruses" classification also refers to? I have some hits showing just "Viruses" as a taxonomy:
(Apologies I should've put this in my first message)
No worries!
"Viruses" just mean that the genome could not be assigned to a specific realm. This can happen when, for instance, the genome was annotated by two markers with conflicting taxonomies, so there's no consensus realm.
Perfect thank you so much! 😊
I'm happy to help! Let me know if you have any other questions :)
This solved one of my questions too, but I have a lot of contigs identified as virus by virsorter2-checkv-dramv pipeline which were not classified as virus or unclassified by genomad, up to about 50% of total contigs. How can i explain this, cus the para of the previous process are quite demanding like below: https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-5qpvoyqebg4o/v3?step=1
It's difficult to tell what is going on without further context. Are those sequences short? What is their CheckV quality? Have you tried to run geNomad with the --relaxed
parameter?
Thanks for making such a great tool! It's super easy to run and exactly what I need for my project.
Can you explain what the difference is between "Unclassified" and just "Viruses" in the taxonomy output? Does it mean that Unclassified hits are unknown if they're viruses at all?
For example, I have 292 'Unclassified' hits and 7668 'Viruses' hits across 2000 genomes, does this mean the unclassified could have possibly been plasmid/chromosome?