faylward / viralrecall

Detection of NCLDV signatures in 'omic data
30 stars 11 forks source link

NCLDV taxonomy #9

Closed fujch7 closed 2 years ago

fujch7 commented 2 years ago

Hi, Thanks for your amazing tool! I have successfully run this tool. But how could I get the taxonomy of NCLDV seqs? It's not found around the result files. Look forward to your favourable reply.

faylward commented 2 years ago

No taxonomy is provided with viralrecall, because it is based primarily on HMMs. If you wanted to assign taxonomy to the NCLDV regions afterwards you could take the proteins predicted in the viral regions and search them against the proteins in the GVDB (https://faylward.github.io/GVDB/).

JSSaini commented 2 years ago

Hi, thank you for this tool. I also stuck at this step.

How can I do this? Do you also have a script to assign taxonomy to NCDLV?

faylward commented 2 years ago

Just to make sure we are talking about the same thing, to assign taxonomy confidently it is necessary to bin contigs and get draft genomes. Assigning taxonomy to contigs individually is very difficult- some are very short and lack marker genes, for example. There are many ways to do binning- simple tools like MetaBat2 actually do a fairly good job, but there are other alternatives (see https://merenlab.org/2022/01/03/giant-viruses/).

If you already have bins/genomes, then I would recommend making a phylogeny using ncldv_markersearch with your genomes together with references (https://github.com/faylward/ncldv_markersearch). The default options of this tool use 7 marker genes to make the concatenated alignment, and you can then use IQ-TREE to make the final tree.