AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
142 stars 37 forks source link

Taxonomic identities of predicted phages? #17

Open Anto007 opened 4 years ago

Anto007 commented 4 years ago

Hi, I was wondering why VIBRANT doesn't output the taxonomic identities of predicted phages (should they be not novel ones)? In case VIBRANT does this, where can I find this info in the output?

Anto007 commented 4 years ago

Also, I think it would be very convenient for users to directly have the sequence files that are relevant for all complete circular phages detected by VIBRANT (i.e., 'VIBRANT_complete_circular_metagenome.tsv')

KrisKieft commented 4 years ago

Hi,

Predicting phage taxonomy is not simple and in some cases the distinction between different groups is not defined well. I am currently working on a phage taxonomy tool as a side project but the implementation into VIBRANT will likely not be soon. It's also difficult to quickly assess if the phage is novel or not because the databases for viruses are very large combined (NCBI, IMG/VR, GOV2, single publications, etc.). A tool that currently exists for this is vConTACT2 which relies on reference viruses mainly (i.e., NCBI RefSeq). vConTACT2 was used in the VIBRANT pre-print manuscript.

I'll consider the new file for circular viruses, it would be fairly easy to implement. I had another question about adding a file for prophage coordinates. I'm going to wait for any other necessary updates and put all these suggestions together in a single update. This will likely be in a couple weeks. Likely just minor enhancements like this and not anything with the method of virus identification.

Kris

Anto007 commented 4 years ago

Thank you for your quick response. I agree with you on phage taxonomy and so no major worries there. I look forward to seeing the new updates.

SilasK commented 4 years ago

The pVOGs used in VIBRANT don't they give a taxonomic annotation?

KrisKieft commented 4 years ago

Hi Silas,

One of the most used method for host prediction is to use percent identity of phage proteins to known reference phages. Then if you get hits to a known phage you likely know the host (based on the host of the reference phage). VIBRANT specifically uses HMMs, which is different than BLAST against a protein database. HMMs may contain info from diverse phages that do not share the same taxonomy or host. That means VIBRANT can't use taxonomic info from VOGs in this case (VIBRANT doesn't use the pVOGs database). I'm working on putting together a quick taxonomy prediction tool using reference phages that will do taxonomy annotation to the Family level.

Kris

shandley commented 4 years ago

Hi @KrisKieft! Really interested in using VIBRANT in some of our labs studies. I am curious as to what output files from VIBRANT you used as input to vContact2? Any tips are greatly appreciated.

Scott

KrisKieft commented 4 years ago

Hi Scott,

I have now added a new script to the scripts/ folder. A full explanation can be found in the updated README at the top (Content Addition). This script can be used to reformat VIBRANT protein outputs for vConTACT2. The specific file that you want to use is combined_phages.faa in your phages output folder. From there you need to make a gene-to-genome file for vConTACT2 and under the type of proteins you input you want to select Prodigal format. Hope that helps.

Kris

shandley commented 4 years ago

Thanks so much @KrisKieft. I will have a chance to work with the script later this week. Many thanks in advance!

Scott

satkinson0115 commented 3 years ago

Hi @KrisKieft,

I'm trying to run some VIBRANT output through vConTACT2 and I'm curious which VIBRANT output to use to generate the gene-to-genome file. I thought I saw a file with the associations with the HMMs, but now that I'm going to look for it I don't see anything like that. Could you clarify which output to use for this step?

Thanks, Samantha

KrisKieft commented 3 years ago

Hi Samantha,

Can you please open a new Issue for this question? Under the Issues tab there's a green "New Issues" button. This will help with organization of specific questions because I've had this one before regarding vConTACT2. You can simply copy and paste your question. Once that is up I'll post my answer for clarity. Thank you for your cooperation.

Kris