Closed bhimbbiswa closed 8 months ago
Hi @bhimbbiswa
Dear Antônio Camargo
Thank you for your kind reply.
- It might be a good idea to run geNomad on the whole assembly to get additional viruses. The intersection between VirSorter2 and geNomad might be too conservative. I've never really benchmarked this, but it's my expectation. You could take the union of the VirSorter2 and geNomad predictions and process it with CheckV.
Thank you very much for your suggestions. To clarify, I'm utilizing phamb, VirSorter2, and geNomad, and then consolidating their results in CheckV.
- What do you mean by chimeric, biologically? If you mean a virus that has genes from two or more lineages, derived from HGT, I don't think that's a good approach to identify them. geNomad's taxonomy works well when it aggregates the taxonomy of multiple markers into a consensus. The taxonomy of individual genes can be noisy.
Actually not biologically, I have an additional step in my process where I use vRhyme for binning after running VirSorter2. However, I'm concerned that this step may result in a few bins containing sequences from two different taxonomically distinct viruses. So, I thought I can use geNomad to filter out those.
Thank and regards,
Bhim
Ohh, I see.
I've never benchmarked the performance of the taxonomy module on bins. Your logic makes sense, but I'd be careful calling those bins chimeras if the taxonomy of the contig was determined by a single gene. The taxonomy is much more reliable when it is derived from the consensus of multiple genes.
Thank you very much for your kind suggestions. If any more doubts I will post again.
Regards,
Bhim
Thank you very much for making this amazing tool. This is becoming very useful in my research.
I performed a metaSpades assembly on pairend 150bp Illumina reads and subsequently employed VirSorter2 to isolate viral sequences. Currently, I am utilizing geNomad to further refine my dataset by excluding non-viral sequences, determining taxonomy, and identifying potential chimeric viral sequences.
I ran geNomad using the following command. My fasta file contains 143117 sequences.
Regarding the chimeric sequences, those that contain genes from two distinct groups, such as "S03-NODE_15397_length_3234_cov_0.690698||full," I am considering as chimeric.
File: VirSorter_combined_virus_summary.tsv
File: VirSorter_combined_genes.tsv
Now, I have a question about the "Viruses;Bicaudaviridae" taxonomy. Upon examining the "genes.tsv" file, I noticed the presence of two different genes from different families. Should I consider this sequence as chimeric as well?
Thanks and regards,
Bhim