Ecogenomics / CheckM

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes
https://ecogenomics.github.io/CheckM/
GNU General Public License v3.0
335 stars 73 forks source link

question about the taxonomy result #183

Closed quliping closed 5 years ago

quliping commented 5 years ago

Hellow, I used the "checkm lineage_wf" command and I'm confused about the "Marker lineage" column in the result. For example, the result of one of my bin is "s__algicola (UID2846)", but there are many species name that we call them "algicola" just like 'Bacillus algicola' and 'Crocinitomix algicola', so how should I know which kind of algicola my bin is?

donovan-h-parks commented 5 years ago

Hello. I do not recommend using CheckM for taxonomic classification. This is not the intended use of the software. You can determine the full taxonomic assignment of CheckM using the tree_qa command, e.g.:

checkm tree_qa -o2

https://github.com/Ecogenomics/CheckM/wiki/Genome-Quality-Commands#tree_qa

Cheers, Donovan

quliping commented 5 years ago

Thank you a lot! I konw what you mean and the checkM taxonomic result is just one of my reference. But there is still a problem that the tree_qa results are different from the Marker lineage column. Just like bin108, the tree_qa result is 'kBacteria;pActinobacteria;cActinobacteria', but the marker lineage result is 'salgicola'...The other one example, bin109, the results are 'kArchaea (root)' and 'kBacteria'... Also bin 113, they are 'kBacteria;pProteobacteria;cAlphaproteobacteria;oRhodospirillales;fRhodospirillaceae' and 'cBetaproteobacteria'... Of course I know they are different principles, one is base on genome tree the other is marker set. So the question is which kind of result should I trust? Or I should use another software just like PhyloPhlAn? I'm confused right now.

Thank you a lot!

Qu Liping

donovan-h-parks commented 5 years ago

Hello Qu. These results don't make sense to me. Both the phylogenetic placement and marker set used for completeness/contamination estimates should be in line with each other. Can you send me the relevant CheckM output files?

quliping commented 5 years ago

Of course I can. The file 'checkM_result_7_8.txt' is the result of 'lineage_wf' command and the 'tree_qa_result_7_8.txt' is the result of 'tree_qa' command. I also upload the genome tree from tree_qa which named 'checkm7_8.zip'. checkM_result_7_8.txt tree_qa_result_7_8.txt checkm7_8.zip

Thank you again.

Qu Liping

donovan-h-parks commented 5 years ago

Hello. Those results do not make sense to me. Both files use the same tree placement so it shouldn't be possible to get incongruent results. Can you send me the 3 bins you have flagged (108, 109, 113)?

quliping commented 5 years ago

OK. I will send the three bins to you. And I also put the other bins that have the same situation into the zip file. different_bins.zip And here are my commands: checkm lineage_wf -x fa -t 8 --pplacer_threads 8 bin_folder output_folder checkm tree_qa -o 2 -f checkm7_8.txt checkM7_8

I'm sorry to bother you so much. Thank you again.

donovan-h-parks commented 5 years ago

Hello. I'm not able to reproduce your results. The 108 and 113 bins use the kBacteria (UID203) marker set, while 109 uses the kArchaea (UID2) marker set. The tree_qa -o2 table indicates that: bin 108 = kBacteria;pActinobacteria;cActinobacteria bin 113 = kBacteria;pProteobacteria;cAlphaproteobacteria;oRhodospirillales;fRhodospirillaceae bin 109 = k__Archaea

This is all consistent and as I would expect. A marker set higher in the tree to where a bin is placed is often used as it was determined that this marker set likely gives a better estimate. For example, there many not be enough f__Rhodospirillaceae reference genomes to establish a reliable set of markers for estimating genome completeness and contamination.

quliping commented 5 years ago

I'm so sory... I went back to confirm the results. And I find something wrong with me. Actually there are two samples and both of them have 118 bins, so I think I I confused their results... I find the really results from the two commands that belong to sample7_8 are the same... So I made a mistake, not the software. I'm so sory, maybe you should close this question...

donovan-h-parks commented 5 years ago

Glad to hear the issue is resolved. Best of luck with your research.