faylward / viralrecall

Detection of NCLDV signatures in 'omic data
30 stars 11 forks source link

How to be sure at recovering NCLDV sequences #8

Closed mabusfer closed 3 years ago

mabusfer commented 3 years ago

Hello!

I am trying to recover NCLDV sequences from metagenomes and metaviromes. I was using virsorter2 with --include-groups "NCLDV" and I also want to use viral recall to ensure that the contigs are effectively from "NCLDV". I am not sure if filtering the contigs using score > 0 is enough (having in mind that I have also used virsorter2) or being more stringent and using 1 of the 10 marker genes presence + score > 0.

Thank you very much!

faylward commented 3 years ago

Personally I think the best bet is to bin the contigs first (i.e. with MetaBat2) and then run VR on the bins. If a bin belongs to an NCLDV then typically most contigs will have scores >>1 (check out the benchmarking results in the Viruses paper). It is also unlikely to get an entire NCLDV genome in one contig, since their genomes are so large, so binning is better in that respect too. If you really want to run VR on individual contigs then setting the cutoff at 1 should be fine, but there will always be some false positives (especially with short contigs) and multiple contigs will probably belong to the same virus.

mabusfer commented 3 years ago

Perfect! thank you very much for your help!