apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
168 stars 17 forks source link

Interpretation of Results #45

Closed owbarber closed 8 months ago

owbarber commented 8 months ago

I have enjoyed using geNomad and find it to be a very useful tool. When geNomad identifies a plasmid or virus on a particular contig, is it saying that entire contig likely makes up the plasmid? Because the annotated genes cover the length of the contig, so I wanted to make sure I am interpreting this correctly.

Is there additional documentation on the significance of assigning the three types of topology to plasmids in particular? I was told DTR plasmids are perhaps more likely to be closed than ITR, but it would be helpful to have some documentation or links to information about interpreting the topology.

Finally, is there a way to identify where geNomad found the direct or inverted terminal repeats in a contig?

Thank you!

apcamargo commented 8 months ago

I have enjoyed using geNomad and find it to be a very useful tool. When geNomad identifies a plasmid or virus on a particular contig, is it saying that entire contig likely makes up the plasmid? Because the annotated genes cover the length of the contig, so I wanted to make sure I am interpreting this correctly.

Yes, you are right. Contigs classified as plasmids are most often entirely plasmidial. There are some unsual cases where integrative and conjugative elements (ICEs) will be classified ad plasmids if the flanking host region is small. In the case of viruses, if flanking host regions are detected, geNomad will extract the virus region and present it as a provirus.

Is there additional documentation on the significance of assigning the three types of topology to plasmids in particular? I was told DTR plasmids are perhaps more likely to be closed than ITR, but it would be helpful to have some documentation or links to information about interpreting the topology.

DTRs are an indicative that a given sequence is complete because they represent an assembly artifact that assembler will leave when generating contigs from circular chromosome or concatermers (you can read more about it here: https://www.nature.com/articles/s41598-017-07910-5). ITRs are not necessarily an indication that a given contig is complete. It is know that some viruses have biological ITRs at the edges of their genomes, so ITRs can be informative. But unless you have an a priori expectation that your genome should possess ITRs if complete, I wouldn't recommend using them as evidence of completeness.

Finally, is there a way to identify where geNomad found the direct or inverted terminal repeats in a contig?

The DTRs/ITRs will be at least 21 bp long, but geNomad won't tell you the exact length or the coordinates.

owbarber commented 8 months ago

Thank you so much for all the information. I know some of it is contextual and knowledge that you don't need to offer as the creator of the tool, but I appreciate you offering it so I can better interpret my results.

apcamargo commented 8 months ago

No worries! I'll consider adding this to the documentation in the future