AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
142 stars 37 forks source link

Prophage Identification #23

Closed elithess closed 3 years ago

elithess commented 4 years ago

Hello,

I have trying out VIBRANT lately and although I find it is superbly made, I have run into a couple of issues regarding prophage identification. As a test, I have input the whole genomes from a couple of bacterial strains I work with as nucleotide fasta files (Genbank acc. numbers CP031538 and CP031169) both of which have active inducable prophages that are recognized when inserted into the online tool Phaster but VIBRANT does not.

Afterwards, I cut out the specific sequence of the prophage as identified by Phaster and inserted it on its own in VIBRANT and it successufully identified as a prophage. As a final test, I inserted a long >1kb random stream of nucleotides up- and downstream from the prophage sequence and ran the program again. Once again, VIBRANT cannot identify the prophage sequence despite having just identified as prophage when inserted individually.

Would you know what the issue might be and how I could resolve it?

KrisKieft commented 4 years ago

Hi elithess,

There is no issue occurring here, VIBRANT is simply not able to identify the prophage in the L. lactis genome. Figure 4 of the publication in Microbiome also shows that prophage identification does not align perfectly with PHASTER (also benchmarked on an L. lactis strain).

Unlike PHASTER, VIBRANT was built to handle metagenomes with identifying both non-integrated and integrated phages. Therefore it needs to have a slightly different mechanism of prophage identification in order to maintain speed and accuracy in a metagenome. The way VIBRANT excises prophages from bacterial genomes is a specific component of analysis separate from calculating if that excised sequence is a prophage or not. What is happening is that VIBRANT is able to identify that phage genome as a prophage because of an encoded integrase gene but it is unable to excise it out of the whole bacterial genome because of some gene arrangement anomalies. To excise prophages VIBRANT uses actual coding regions, so that is why inserting random nucleotides has no effect. PHASTER was built exclusively for excising prophages so I believe if you try and input only the phage sequence it will not identify it as a prophage (that is my experience so far). If you are talking about the prophage with coordinates 848137-890802 you may notice that Prophage Hunter is also slightly different (837843-912918) though it does identify is as a prophage with a score of 0.81.

I hope that answers your question. Every program will miss some phages (again see Figure 4 of PHASTER missing prophages also). Please feel free to ask any more questions or for more clarification.

Kris