Closed ShailNair closed 11 months ago
Hi Shail
The reason for running CheckV is to identify reliable viral-bins, therefore we recommend to only consider Medium and High-quality bins for further analysis (those based on the AAI-model). Low-quality shall always be considered with a lot of scepticism.
As for "Complete-bins" I checked your quality file, which had 4 examples, I see in those that a lot of sequence has been removed by CheckV. Perhaps you wanna evaluate the resulting cleaned sequences by CheckV.
Best, Joachim
Thank you for your prompt response. Yes, only medium and high quality bins will be used for further analysis. I'll re-run checkv on cleaned fasta files (checkv output from the first run) and let you know how it goes.
I ran CheckV again and found that the high number of bacterial genes was mostly in quality bins (high and medium) identified by the HMM-based (lower-bound) model of CheckV.
HI,
I used phamb with recommended workflow(not in parallel) with the default settings on my assembled metagenomic contigs (mixed of all microbial contigs). Later, I used CheckV ( with prodigal -m option enabled) on the concatenated fasta file. Strangely, CheckV analysis revealed that a large number of the bins contained a high number of host (bacterial) genes, accounting for more than 50% (many contigs with more than 70%) of the total number of genes. Surprisingly, CheckV indicates that many of these bins are complete and without contamination. However, the presence of such a large number of host genes will interfere in the downstream analysis. I have attached my checkv results for your reference. quality_summary.txt