WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

Regarding Viral AMG classification #251

Open ShailNair opened 1 year ago

ShailNair commented 1 year ago

Hi, and many thanks for the excellent annotation tool. I have a query regarding viral AMG annotation. I have a bunch of viral contigs identified through metagenomics. The contigs were cleaned using Checkv and prepared for DRAM-V using virsorter2. I found some of the probable AMGs in the main annotation.tsv (not distilled) with virsorter category score of 0-2, and an auxiliary score less than 4, but without any AMG flags. I understand that the lack of AMG flag may be the reason they were not included in the final distillate due to the limitations of the AMG database used by DRAMV. l also looked for viral-like genes in these contigs using the pVOGs, RefSeq-viral, and PHROGS databases with an e-value cutoff of <1E-5. However, I am still unsure whether these genes should be classified as AMGs or not, as some of them appear to meet the criteria for AMG classifications except the missing AMG flag. Here is my annotated figure:

eps gene cluster

Would be happy to know others' insights. I am mostly interested to know what if a virus contig contains a gene cluster (participating in one function) of more than three genes as in the case of contig A in the above picture. Here, the contig A has a size of more than 150 kb (which falls into the jumbo phage category), is complete, and is without contamination (checkV criteria).

rmFlynn commented 1 year ago

Very interesting, I will assign this to our viral team. But first, here are some quick thoughts:

Do consider sharing your data if you are able this will make DRAM stronger. These are just some quick notes you may want to mull over if you have not already expect a longer discussion in the future.

ShailNair commented 1 year ago

Hi, thank you for the prompt reply. Yes, I did check the AMG database, and the AMGs I am focussed on are not included there. I have attached the raw files and the dramV annotations of the probable AMGS here. For your reference, I've also provided additional genes (genes before and after the likely AMG), although my question only pertains to genes with the following kegg orthologs (KO): K16566 K16557 K16554 K16556 K16558 K16555 K16568 K16564 K16560 K16563 All of which are involved in the Exopolysaccharide biosynthesis process.

Thank you dramv.output.zip

rmFlynn commented 1 year ago

Thanks! We will look to add them to AMG database. If you want to add them your self and do a pull request, then you can get the credit on GitHub for your work and can say you are a contributor to DRAM. But I would understand if you wanted to wait until we confirm these to do so. Of course, we will also look into more broad ways to improve the amg database also.

ShailNair commented 1 year ago

I'll wait because I have to conduct a deeper analysis to be confident that these are potential AMGs. Also, it is worthwhile to hear from you, once you have explored the provided files. Thank you for everything you and your team are doing with DRAM.