AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
142 stars 37 forks source link

Does VIBRANT trim any non-prophage sequences? #26

Closed mshamash closed 3 years ago

mshamash commented 3 years ago

Hello,

I am wondering whether or not VIBRANT trims any non-prophage sequences, such as phage contigs with DTRs on both ends (signifying a likely-complete viral genome, for example). I'm thinking about integrating CheckV into a viral metagenomics pipeline after VIBRANT detects viral signal from our scaffolds, and am trying to figure out the best way to go about this. From my understanding, CheckV relies on DTRs as one way of determining genome completeness, so if VIBRANT trims one of the two DTRs off, this may affect our downstream results.

Thanks in advance.

Michael

KrisKieft commented 3 years ago

Hi Michael,

VIBRANT has the ability to extract an integrated prophage from a host sequence. Since the input for VIBRANT is unknown sequences, it will attempt to identify if a sequence is non-phage (bacterial) but contains a phage section. If that is identified then the phage section will be cut out and identified as a prophage. This all comes before DTR identification. Therefore, if you have a complete (with DTRs) integrated prophage then VIBRANT will cut it using annotation information, not DTR information. In some cases the DTRs here can be lost. This is the method that VIBRANT has to use in order to maximize runtime on metagenomes. For non-prophage sequences (lytic phages) or active prophage sequences (excised lytic phase) the DTRs should remain because VIBRANT won't attempt to cut the sequence. It needs to be sufficiently bacteria-like with an adjacent prophage region in order to cause cutting. So to summarize, VIBRANT may only accidentally cut off DTRs if a given sequence is bacterial with an inserted prophage sequence. I hope that helps.

Kris

mshamash commented 3 years ago

Thanks Kris. So in other words, VIBRANT would never trim a lytic phage contig? I think this is something I noticed with VirSorter in the past where it would trim so that the contig ends were exactly in line with the start of the first gene and end of the last gene (anecdotal, never actually confirmed with Dr. Roux)

KrisKieft commented 3 years ago

I believe VIBRANT and VirSorter work in the same manner in which they will identify a site in which to cut. That is opposed to the conventional prophage detection software methods (e.g., PHASTER, Prophage Hunter) that identify an actual prophage to cut out. Does that distinction make sense? Since VIBRANT looks for a place to cut it is possible that if the middle of a phage looks like it's bacterial then it might get cut, retaining both the first half and the second half but removing a section in the middle. This would be the result of a false positive prophage detection (i.e., the true sequence is completely lytic phage but the software thought it was an integrated prophage). You can check out the VIBRANT manuscript for metrics on how often this happens, but the benchmark was that 0.2% of lytic phages that were tested were incorrectly cut.

Kris

mshamash commented 3 years ago

Great, appreciate the clarification!

Cheers,

Michael