Ecogenomics / CheckM

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes
https://ecogenomics.github.io/CheckM/
GNU General Public License v3.0
349 stars 73 forks source link

Different results between lineage_wf runs #348

Closed cifuj closed 5 months ago

cifuj commented 2 years ago

Hi @dparks1134,

I have an issue with checkm lineage_wf. I have a set of bins that are annotated differently when I run them together or separately. I am using checkm lineage_wf -x fasta -t 5 --file "test" ./ results. I only change the bins in the folder to obtain these results.

When I run them together, I obtain


Bin Id Marker lineage # genomes # markers # marker sets 0 1 2 3 4 5+ Completeness Contamination Strain heterogeneity

bin_5354_H_all_kfilt3_MH_21_141.1 kBacteria (UID2495) 2993 143 89 6 136 1 0 0 0 94.38 1.12 0.00 5354_J_MH_concont_bin.163 cSpirochaetia (UID2496) 72 215 125 10 205 0 0 0 0 93.60 0.00 0.00 bin_5354_R_all_kfilt3_MH_21_141.4 kBacteria (UID2495) 2993 143 89 7 132 4 0 0 0 93.26 3.56 50.00 5354_X_MH_concont_bin.117 cSpirochaetia (UID2496) 72 215 125 12 203 0 0 0 0 92.80 0.00 0.00


But when I run bin_5354_H_all_kfilt3_MH_21_141.1 alone, I obtain a different Marker lineage and then completeness and contamination


Bin Id Marker lineage # genomes # markers # marker sets 0 1 2 3 4 5+ Completeness Contamination Strain heterogeneity

bin_5354_H_all_kfilt3_MH_21_141.1 c__Spirochaetia (UID2496) 72 215 125 18 197 0 0 0 0 88.00 0.00 0.00


The lineage.ms file is also different for both runs. bin_5354_H_all_kfilt3_MH_21_141.1 10 UID2497 o__Spirochaetales 71 bin_5354_H_all_kfilt3_MH_21_141.1 11 UID2502 o__Spirochaetales 66

I installed checkm (CheckM v1.2.1) again using conda today and reinstalled the checkm database.

donovan-h-parks commented 2 years ago

Hi. Sorry for the slow reply. This is a bit surprising, but can happen as CheckM is not deterministic. CheckM places your genome into a reference tree using pplacer. This placement can change slightly each time CheckM is run. In general pplacer is very stable, but it can happen.