Open jolespin opened 1 year ago
vRhyme is for binning contigs sequenced from the same genome but assembled in various fragments. It is not a clustering tool.
In your example with 3 "complete" viral genome fragments, if vRhyme were to bin those genomes together, that would be a viral MAG, not a pan genome. As far as what that means with regard to checkV's quality metrics, it would probably require a more in depth analysis of your data to understand why checkV produces those numbers.
You can also feed checkV viral MAGs, so long as you setup the genes-to-genome file to point all genes from each scaffold in the viral MAG back to a single identifier. That could help you evaluate how checkV views the individual scaffolds vs a binned genome.
To add to that, vRhyme isn't without error (neither is CheckV). If CheckV is wrong, then vRhyme may be binning 3 contigs of a genome into a vMAG. If vRhyme is wrong then the bin created contains 3 different genomes and is contaminated (or a pangenome). As @cody-mar10 mentioned, it may required a more in depth analysis of your data to resolve this.
@KrisKieft is there a way to calculate coverage separately? I'm seeing cov_table_convert.py
but it's not installed with conda.
Is it possible to use coverm
instead? https://github.com/wwood/CoverM
The reason why I'm asking is because VirFinder, geNomad, VirSorter(2), etc. operate on individual contigs then those results are usually fed into CheckV to determine how complete/contaminated the virus is (similar to CheckM and BUSCO). So let's say there are 3 contigs that are all 100% complete and 0% contaminated determined by CheckV. If those are binned together, would that bin be considered a metagenome-assembled genome or would it be a pangenome since the contamination would be high based on the notes above.