Closed michoug closed 3 years ago
Hello, sorry for the delay, i was off for a couple days.
This feature is not yet fully documented, so its good that you ask.
If you use EukCC to also predict proteins (Using GeneMark-ES). EukCC can associate marker genes found with contig names and thus can estimate how many contigs did not contribute a single copy marker gene to the estimated completeness/contamiantion score.
Thus if all SCMGs are located only on 50% of the contigs, this means that possibly up to 50% of the remaining contigs could be contamination from a foreign genome and we would not know.
With fewer contigs, this max silent contamination will go down, and thus reward higher quality assemblies with a more confident score.
I would not worry too much about it, it makes the uncertainty visible, but that uncertainty was always there. So a max silent contamination of 96%, means that only 4% of your MAGs DNA have contributed to compute the completeness/contamination score. Thats not uncommon for shot read assembled metagenome.
Hope that explains it.
Hi, I don't understand what the max_silent_contamination means in the
eukcc.tsv
file. For a MAG, I have 0 contamination and 96.03 of max_silent_contamination, for example Best Greg