apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

Inquiry on virus from MAG #94

Closed songmj86 closed 1 month ago

songmj86 commented 2 months ago

Hi

I ran Genomad against the subsets of bacterial MAGs collected from GTDB

I found out that both of virus and provirus sequences were identified from a certain MAG.

If so, is it OK to consider the virus and provirus sequences as bacterial virus as I want to futher examine the viral genomic properties after extracting these virus/provirus sequences using Seqkit ?

Furthermore, if I extract the viral sequence (one contig/scaffold ) from bacterial MAG, bacterial MAG loses one contig previously deemed as bacterial sequence. If so, genome completeness and taxonomy of the bacterial MAG would be altered after extraction of viral sequence. Should I re-estimate the completeness and re-classify taxonomy of the bacterial MAG ?

Confusing.... Anyway, my ultimate goal is to identify viruses from some bacterial groups and examine the association between virus/provirus and their hosts.

Thanks !

I am looking forward to hearing you

songmj86 commented 1 month ago

Hi ! I am waiting for your response

Thanks

apcamargo commented 1 month ago

Hi @songmj86

Sorry I'm taking this long to answer. I'm away right now, but I'll write you a proper answer within a few days.

apcamargo commented 1 month ago

Hi @songmj86,

If so, is it OK to consider the virus and provirus sequences as bacterial virus as I want to futher examine the viral genomic properties after extracting these virus/provirus sequences using Seqkit?

By "extracting these virus/provirus" you mean getting their sequences in a FASTA format? geNomad already provides that for you, so you don't need to run SeqKit (see the <prefix>_summary/<prefix>_virus.fna file). This is specially handy because geNomad already extracts the provirus region for you.

Furthermore, if I extract the viral sequence (one contig/scaffold ) from bacterial MAG, bacterial MAG loses one contig previously deemed as bacterial sequence. If so, genome completeness and taxonomy of the bacterial MAG would be altered after extraction of viral sequence. Should I re-estimate the completeness and re-classify taxonomy of the bacterial MAG?

I wouldn't remove sequences containing proviruses, since those are part of the bacterial chromosome. If you only remove fully viral sequences (that is, all viruses that are not proviruses), you shouldn't expect substantial differences in the completeness estimates or the taxonomy, since the genes you're removing are not part of the bacterial chromosome and they shouldn't affect the results of CheckM or GTDB-Tk. Regardless, I'd still run those tools again because small changes can happen and you need to be sure that your results are reproducible and that other people will get the same numbers you report when they process your data.

Confusing.... Anyway, my ultimate goal is to identify viruses from some bacterial groups and examine the association between virus/provirus and their hosts.

My only recommendation is to be careful with non-proviral virus sequences when inferring host. These can be the results of misbinning, as MGEs are known to be hard to bin.

songmj86 commented 1 month ago

Thanks for reply !