Ecogenomics / CheckM

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes
https://ecogenomics.github.io/CheckM/
GNU General Public License v3.0
347 stars 73 forks source link

The effect of non-prokaryotes sequence #355

Closed neptuneyt closed 4 months ago

neptuneyt commented 2 years ago

The effect of non-prokaryotes sequence

Dear checkM team, thanks a lot for developing such a wonderful software. Here I have a question, checkm identifies contamination and integrity based on marker genes of prokaryotes, if a high quality bacterial genome is accidentally mixed with sequences of eukaryotes (e.g. fungi, algae or eukaryotic hosts), viruses, then it still assesses high quality, I tested it and it does. Hopefully, the effect of these non-prokaryotes will be considered in the next version. Best wishes!

azat-badretdin commented 2 years ago

I am a new user of CheckM, but I vaguely remember something about contamination in the output as well. Theoretically, given that backbone of CheckM is a set of HMM marker sets for various taxonomies, one can detect contamination by hitting something belonging to a significantly different marker set. One just needs to pick a suitable workflow I guess

donovan-h-parks commented 2 years ago

CheckM2 is almost certainly a better option if eukaryotic or viral contamination is a concern: https://github.com/chklovski/CheckM2