AnantharamanLab / vRhyme

Binning Virus Genomes from Metagenomes
GNU General Public License v3.0
58 stars 10 forks source link

[Question] Is this for binning contigs from the same viral genome or clustering viral genomes? #27

Open jolespin opened 1 year ago

jolespin commented 1 year ago

The reason why I'm asking is because VirFinder, geNomad, VirSorter(2), etc. operate on individual contigs then those results are usually fed into CheckV to determine how complete/contaminated the virus is (similar to CheckM and BUSCO). So let's say there are 3 contigs that are all 100% complete and 0% contaminated determined by CheckV. If those are binned together, would that bin be considered a metagenome-assembled genome or would it be a pangenome since the contamination would be high based on the notes above.

cody-mar10 commented 1 year ago

vRhyme is for binning contigs sequenced from the same genome but assembled in various fragments. It is not a clustering tool.

In your example with 3 "complete" viral genome fragments, if vRhyme were to bin those genomes together, that would be a viral MAG, not a pan genome. As far as what that means with regard to checkV's quality metrics, it would probably require a more in depth analysis of your data to understand why checkV produces those numbers.

You can also feed checkV viral MAGs, so long as you setup the genes-to-genome file to point all genes from each scaffold in the viral MAG back to a single identifier. That could help you evaluate how checkV views the individual scaffolds vs a binned genome.

KrisKieft commented 1 year ago

To add to that, vRhyme isn't without error (neither is CheckV). If CheckV is wrong, then vRhyme may be binning 3 contigs of a genome into a vMAG. If vRhyme is wrong then the bin created contains 3 different genomes and is contaminated (or a pangenome). As @cody-mar10 mentioned, it may required a more in depth analysis of your data to resolve this.

jolespin commented 11 months ago

@KrisKieft is there a way to calculate coverage separately? I'm seeing cov_table_convert.py but it's not installed with conda.

Is it possible to use coverm instead? https://github.com/wwood/CoverM