AnantharamanLab / vRhyme

Binning Virus Genomes from Metagenomes
GNU General Public License v3.0
58 stars 10 forks source link

Interpretation of results #5

Open michoug opened 2 years ago

michoug commented 2 years ago

Hi, I tried your tool on one of my datasets where I got viral contigs with Vibrant then I ran VRhyme and compared the results obtained after the dereplication part before or after generating vMAGs.

Here are before generating vMAGs

checkv_quality n mean sum max Complete 557 46179.5 25721993 373392 High-quality 413 44008.8 18175622 275626

Here is after

checkv_quality n mean sum max Complete 437 48556.5 21219180 373392 High-quality 514 46641.6 23973794 387939

Checked the quality with checkV and only selected best quality "viruses" Where mean is the mean of contig length, sum is the total length of all contigs and max is the maximum size of the biggest “virus”.

So the average length is higher but the "contamination" is also higher? Any input on these results? Best Greg

KrisKieft commented 2 years ago

Hi,

I have a couple questions before I can give more thoughts on this.

michoug commented 2 years ago

Hi,

Out of the 557 "complete" contigs, 137 were clustered with others in vRhyme. Yes, I'm running checkV after binning on both the bins and unbinned contigs. And yes, it's an aggregation of data from multiple samples binned

Greg

KrisKieft commented 2 years ago

Are these complex virome samples? I'm currently working on an updated v1.1.0 that should address some of these issues. In addition to updates that improve precision, I implemented a step to remove complete (circular) sequences before binning. The update should be available within the next 1-2 weeks.