AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
142 stars 37 forks source link

Lytic vs lysogenic #16

Closed snayfach closed 4 years ago

snayfach commented 4 years ago

I'm a little confused as to the classification of lytic versus lysogenic and I didn't see in the preprint how this determination is made.

In the output files I see lytic viruses that are "fragments", meaning they are identified on only a portion of an input sequence. Likewise I see prophages that are not fragments, meaning they occupy an input sequence end-to-end:

scaffold type Quality
1 lytic low
4 lytic low
5 lytic low
7 lysogenic low
8 lytic high
12 lytic high
13 lytic medium
14 lytic high
24 lytic high
0_fragment_1 lytic low
10_fragment_1 lytic low
11_fragment_1 lytic low
15_fragment_1 lytic high
16_fragment_2 lytic high
18_fragment_1 lytic high
19_fragment_1 lysogenic high
2_fragment_1 lytic high
22_fragment_1 lytic high
23_fragment_1 lytic high
3_fragment_1 lytic high
6_fragment_1 lytic high
9_fragment_3 lytic high
KrisKieft commented 4 years ago

Which version are you running? There was a bug in the classification of fragments as lytic when they were supposed to be classified as lysogenic (early v1.0.1). v1.1.0 and the most recent v1.2.0 will solve that issue. The correct classification is all fragments should be lysogenic because they were determined to be integrated viral sequences that VIBRANT excised from a larger (likely host) scaffold. There is an error rate with fragmentation but it's <1% based on my tests. The other way for a scaffold to be identified as lysogenic is if an integrase is identified. All scaffolds not considered lysogenic are classified as lytic. Therefore lysogenic should be fairly specific but lytic will also contain lysogenic viruses that VIBRANT was unable to distinguish. I hope that helps. Sorry about the confusion caused by the bug.

Kris

snayfach commented 4 years ago

VIBRANT v1.0.1 installed yesterday using conda. Besides this minor issue are there other issues that require me updating to v1.2.0?

KrisKieft commented 4 years ago

Check out issue #15. I would suggest updating. The new v1.2.0 is on bioconda so you can just run conda install -c bioconda vibrant==1.2.0. I'm not sure if that will remove the old v1.0.1 or not, but you might want to in order to delete the old databases. I apologize for the quick succession of updates, but the goal is to retain v1.2.0.

Kris

snayfach commented 4 years ago

Thanks I'll give that a try.

chenyj8 commented 3 years ago

Which version are you running? There was a bug in the classification of fragments as lytic when they were supposed to be classified as lysogenic (early v1.0.1). v1.1.0 and the most recent v1.2.0 will solve that issue. The correct classification is all fragments should be lysogenic because they were determined to be integrated viral sequences that VIBRANT excised from a larger (likely host) scaffold. There is an error rate with fragmentation but it's <1% based on my tests. The other way for a scaffold to be identified as lysogenic is if an integrase is identified. All scaffolds not considered lysogenic are classified as lytic. Therefore lysogenic should be fairly specific but lytic will also contain lysogenic viruses that VIBRANT was unable to distinguish. I hope that helps. Sorry about the confusion caused by the bug.

Kris

Hi,

Not sure if I should post a new issue or ask here. If a scaffold is identified as lysogenic because of the presence of an integrase, would that scaffold contain host genome? In other words, is it possible to get the coordinates of viral fragments in the lysogenic scaffolds?

KrisKieft commented 3 years ago

Hi,

Lysogenic viruses may or may not contain part of a host genome. If I understand your question correctly you are asking if there is an output file that contains the coordinates of all integrated viruses (lysogenic viruses with host genome). Yes, this exists in the most recent version (v1.2.1). After running v1.2.1 you will find an "integrated_prophage_coordinates" file within the results folder. This will contain coordinates of the subset of lysogenic viruses that were found to be integrated into a host genome. I hope that helps.

Kris

jzrapp commented 3 years ago

Hi Kris,

just for clarification: Your "lysogenic" classification combines lysogenic viruses (identified as fragments) as well as temperate phage (phage that encodes integrase, but currently may not be integrated)?

Thanks again, Josephine

KrisKieft commented 3 years ago

Hi Josephine,

Yes that is correct.

Kris

TJrogers86 commented 2 years ago

I have a question in this regard. I ran VIBRANT on my MAGs and not on my assemblies. One assumes contigs that go into MAGs (or bins) would be bacterial/archaeal with the possibility of having a lysogenic viral sequence embedded in the contig. However, I had a number of viral predictions that are predicted to be lytic. If a lytic virus does not integrate into the host genome, then why would a lytic contig go into a bacterial/archaeal MAG?

KrisKieft commented 2 years ago

Hi,

There are several reasons for this to happen.

The foremost, major reason is that binning a MAG is not an accurate process. Often, almost always, a MAG will have contamination of some sort. This will be sequences that should not be included in the MAG. If these sequences are lytic viruses then VIBRANT may be identifying them.

Another possibility is that there are lysogenic viruses correctly binned but being mis-identified as lytic. To identify lysogenic VIBRANT will check if the viral sequences was excised from a host sequence or if it encodes an integrase. You may have a correctly binned lysogenic virus on its own sequences (not integrated) and it's missing the piece of it that encodes an integrase.

A final option is that the VIBRANT call is incorrect. Sometimes with short sequences, such as <5kb, VIBRANT will call bacterial/archaeal MAG sequences as viral. In this case it's neither lytic or lysogenic because it's not a virus.

TJrogers86 commented 2 years ago

Thanks! That makes a lot of since.