RasmussenLab / phamb

Downstream processing of VAMB binning for Viral Elucidation
MIT License
44 stars 8 forks source link

Running vamb/phamb using only Vibrant contigs #32

Closed michoug closed 2 years ago

michoug commented 2 years ago

Hi, I was wondering if it could be appropriate in your opinion to assemble reads into contigs, get the putative viral ones with Vibrant (or another equivalent software), concatenate them and then run vamb and then phamb? Best Greg

joacjo commented 2 years ago

Hi Greg

My only worry with this approach is that you might miss out on some contigs (especially smaller ones < 5000 bp) that might belong to a virus and contain a limited number of viral genes, thus they will not be picked up by VIBRANT in the process and be discarded. Nevertheless, your approach is totally feasible and should work fine. In the paper we did benchmarks on contig-subsets derived from viruses only (this will sort of correspond to your contig-input to vamb) and it really put the binning on "steroids".

Best, Joachim

michoug commented 2 years ago

Hi Joachim Thanks a lot for the quick response, FYI the default for Vibrant to consider contigs is 1kb and 4 ORFs. Best Greg

joacjo commented 2 years ago

Hi Greg

OK! Maybe it's not a big issue with the smaller contigs then. We did not exactly optimise the hyperparamaters of the vamb binner for pure viral-contigs subset, instead it's optimised to handle the simultaneous presence of bacterial or contigs from other entities, but so far in our tests it has performed pretty well for pure virus subsets. Remember to do binsplits of your VAMB clusters by the sample identifier in your contigs (also advised in the VAMB repo).

Wishing you all the best with your viral metagenomic research! :-)

Best, Joachim